SQL Server with Mr. Denny

Oct 30 2013   2:00PM GMT

TempDB latch timeouts with lots of RAM

Denny Cherry Denny Cherry Profile: Denny Cherry

Recently I had an interesting problem where the SQL Server 2008 R2 instance would randomly in the middle of the morning start having latch timeouts on various tempdb database pages. The first assumption was that these pages were GAM pages and that more tempdb database files would solve this problem. However looking at these pages, these weren’t GAM pages but instead were normal data pages.

2013-07-27 08:29:31.50 spid745 Time out occurred while waiting for buffer latch — type 2, bp 00000000F6FBA400, page 103:8, stat 0xc0000b, database id: 2, allocation unit id: 281474978938880/2228224, task 0x00000000056D5708 : 0, waittime 300, flags 0x3a, owning task 0x0000000027EEE748. Continuing to wait.

2013-07-27 08:29:31.54 spid1107 Time out occurred while waiting for buffer latch — type 2, bp 00000000F6FBA400, page 103:8, stat 0xc0000b, database id: 2, allocation unit id: 281474978938880/2228224, task 0x0000000026EA8988 : 0, waittime 300, flags 0x3a, owning task 0x0000000027EEE748. Continuing to wait.

2013-07-27 08:29:31.56 spid672 Time out occurred while waiting for buffer latch — type 2, bp 00000000F6FBA400, page 103:8, stat 0xc0000b, database id: 2, allocation unit id: 281474978938880/2228224, task 0x00000000272154C8 : 0, waittime 300, flags 0x3a, owning task 0x0000000027EEE748. Continuing to wait.

2013-07-27 08:29:31.65 spid1919 Time out occurred while waiting for buffer latch — type 2, bp 00000000F6FBA400, page 103:8, stat 0xc0000b, database id: 2, allocation unit id: 281474978938880/2228224, task 0x000000265E681048 : 0, waittime 300, flags 0x3a, owning task 0x0000000027EEE748. Continuing to wait.

2013-07-27 08:29:32.05 spid819 Time out occurred while waiting for buffer latch — type 2, bp 00000000F6FBA400, page 103:8, stat 0xc0000b, database id: 2, allocation unit id: 281474978938880/2228224, task 0x0000000027509048 : 0, waittime 300, flags 0x3a, owning task 0x0000000027EEE748. Continuing to wait.

Looking at the server’s memory usage, in this case via Spotlight for SQL Server, we could see that the SQL Server was allocating huge amounts of memory to the SQL Server process, but it wasn’t actually using this memory for anything it was just allocating it.

To make things more interesting, this problem first started happening after we upgraded the RAM in the server from 256 Gigs of RAM to 1 TB of RAM. While trying to figure out what was happening we could simply reduce the maximum amount of RAM that SQL Server could access to below 256 Gigs of RAM and the problem would just go away.

To make things worse management wouldn’t allow the server to remain broken long enough for any sort of proper diagnosis to be done. So basically we could try a change, and if the problem came back all we could do was set the memory back down to 256 Gigs and wait for the next window to try the next fix.

After a bit of trial and error of different traceflags and settings we found the right set of settings. We turned on traceflag 834 which turns on large page allocations. This traceflag requires that the lock pages in memory setting is enabled, so that was turned on as well. We also turned on AWE within the SQL Server based on this blog post from Microsoft.

After making these changes and setting the max server memory on the server back to 900+ Gigs of RAM and everything began working as expected without the above page latch timeout errors.

 Comment on this Post

 
There was an error processing your information. Please try again later.
Thanks. We'll let you know when a new response is added.
Send me notifications when other members comment.

REGISTER or login:

Forgot Password?
By submitting you agree to receive email from TechTarget and its partners. If you reside outside of the United States, you consent to having your personal data transferred to and processed in the United States. Privacy

Forgot Password

No problem! Submit your e-mail address below. We'll send you an e-mail containing your password.

Your password has been sent to: