Recently a friend was working on one of his clients SQL Servers and he ran into an interesting problem. The hardware in question was a HP DL 580 with four chips, each with 10 cores, with hyper threading enabled. This should have presented to SQL Server as 80 total logical cores. The problem was that SQL Server was only seeing 40 cores. The server in this case was Windows Server 2008 R2 and the SQL Server was SQL Server 2008.
If you are familiar with SQL Server 2012 you may be thinking that this is done licensing limitation, but you would be wrong. The problem here is a NUMA problem.
The reason that the problem comes up (which I’ll cover before giving you the solution) because of the way NUMA works on large systems. We all know that we have NUMA nodes, which on todays servers are basically one physical CPU socket per NUMA node. NUMA nodes are put into groups when there are lots of logical processors. The thing is that a single NUMA group can only contain 64 logical processors. Looking back at our HP DL 580, we have 40 cores with hyper threading, which is 80 logical processors. That means that we need to have two NUMA groups.
This is no problem on Windows 2008 R2 as it supports NUMA groups (Windows 2008 and below do not). However the problem is that SQL Server 2008 doesn’t support NUMA groups, so it can only see the logical processors that are in NUMA group 0 (you can have up to 4 NUMA groups which are numbered from 0 to 3).
Because of this the SQL Server was only able to see 40 cores, and those 40 cores were the physical and logical cores from CPUs 0 and 1. We could see this in the errorlog file because it only showed CPUs 0 and 1. Why doesn’t SQL Server just use the physical cores from all the processors and ignore the hyperthreaded cores? Well that’s because it has no idea that CPUs 2 and 3 exist because it can’t see them over in NUMA Group 1.
When running the workload on this machine, 1/2 of the physical CPU power just isn’t being used.
Why do we are about this? Well the 1st problem is that accessing the memory that is attached to CPUs 2 and 3 is going to be expensive as that memory is in another NUMA node than the CPU that’s doing the work. That’ll slow things down with all the cross NUMA node requests. The other problem here is that under a heavy CPU workload the SQL Server will be using 20 real CPUs and 20 virtual CPUs. It would be much better to have access to all the physical CPU cores.
The solution here was quite simple once we realized what the problem was. Disable hyper threading. Now SQL Server can still only see 40 logical processors, but it’s getting all 40 physical cores on the server. This means that cross NUMA node memory access should be mostly gone and we’ve got all the CPU power that we paid for available to us.
As our servers get larger and larger we’ll have more and more cases of older versions of SQL Server not being able to see all the CPU power, and this is why. The number of logical processors that SQL Server could see really depends on the physical server config and how many cores each physical processor has. The basic idea behind the problem is that not all the cores are showing to the SQL Server.
Blackberry was back in the news yesterday and today with the news that they have called off the search for a buyer and are instead looking for a new CEO. According to the news reports that I’ve read there were talks of other companies thinking about buying Blackberry.
My big question to this is simple.
Buying Blackberry as best as I can figure out comes with a massive cash requirement and not much to show for it. If you buy the entire Blackberry company you get the joy of paying to maintain the blackberry service which every Blackberry out there requires so that they can still use the data network. This is because for any Blackberry to send and receive data the phone has to be able to talk to the Blackberry servers which are run out of Blackberry’s data center in Canada. Running these servers will also require staff, and IT staff isn’t free.
Of course you’ll also get the Blackberry inventory of phones that haven’t been sold, most of which probably won’t be because the large majority of Blackberry users are corporate customers, and companies don’t upgrade their employee’s cell phones all that often. And when they do, odds are they won’t be purchasing new Blackberry phones for them.
So what would a company get if they were to purchase Blackberry? They’d get some pretty smart engineers, some good software and hardware patents, and that’s probably about it. Everything else that they’d get is a massive cost with little to no return.
In the article I linked to above it says that the companies largest shareholder is going to dump $1B into Blackberry to try and reboot the brand. Frankly I fail to see why. Blackberry was a major player for a long time, but they made the ultimate mistake, they failed to keep up with market trends. And because of that their loyal following of customers have mostly left. Hell I myself was a loyal Blackberry user for over 10 years. But when they released their crappy 1st and 2nd attempt at a touch screen and their was no change to the slow startup speed of their phones I dropped them for an Android phone and I’ve never looked back. And frankly I’m glad that I did.
If Blackberry (or RIM or whatever name you want to refer to them by) had made some different decisions a few years ago they would still be a major force to be reckoned with. But today they are a joke in the consumer market, and they are loosing more and more of the corporate market every month.
I’m sure that this dumping of cash into Blackberry by their investor is an attempt to try and control the situation so that they can get some return on their prior investment. But I’ve got bad news for them. At this point I’m pretty sure that they are just throwing away money. Their best bet at this point is probably to inform the users that they are going to shutter the company in X months giving the users time to get off the Blackberry phones. Then auction off the patents to the highest bidder (these guys were doing two way devices way before anyone else was, I’m sure there’s some good patents in there), and sell the rest for scrap. This wouldn’t be pretty or popular, but I’m pretty sure that it’s the best business decision that they could possible make.
Sadly I’m pretty sure that they won’t take my advise, so we’ll have to see if I’m right or wrong.
Recently I had an interesting problem where the SQL Server 2008 R2 instance would randomly in the middle of the morning start having latch timeouts on various tempdb database pages. The first assumption was that these pages were GAM pages and that more tempdb database files would solve this problem. However looking at these pages, these weren’t GAM pages but instead were normal data pages.
2013-07-27 08:29:31.50 spid745 Time out occurred while waiting for buffer latch — type 2, bp 00000000F6FBA400, page 103:8, stat 0xc0000b, database id: 2, allocation unit id: 281474978938880/2228224, task 0x00000000056D5708 : 0, waittime 300, flags 0x3a, owning task 0x0000000027EEE748. Continuing to wait.
2013-07-27 08:29:31.54 spid1107 Time out occurred while waiting for buffer latch — type 2, bp 00000000F6FBA400, page 103:8, stat 0xc0000b, database id: 2, allocation unit id: 281474978938880/2228224, task 0x0000000026EA8988 : 0, waittime 300, flags 0x3a, owning task 0x0000000027EEE748. Continuing to wait.
2013-07-27 08:29:31.56 spid672 Time out occurred while waiting for buffer latch — type 2, bp 00000000F6FBA400, page 103:8, stat 0xc0000b, database id: 2, allocation unit id: 281474978938880/2228224, task 0x00000000272154C8 : 0, waittime 300, flags 0x3a, owning task 0x0000000027EEE748. Continuing to wait.
2013-07-27 08:29:31.65 spid1919 Time out occurred while waiting for buffer latch — type 2, bp 00000000F6FBA400, page 103:8, stat 0xc0000b, database id: 2, allocation unit id: 281474978938880/2228224, task 0x000000265E681048 : 0, waittime 300, flags 0x3a, owning task 0x0000000027EEE748. Continuing to wait.
2013-07-27 08:29:32.05 spid819 Time out occurred while waiting for buffer latch — type 2, bp 00000000F6FBA400, page 103:8, stat 0xc0000b, database id: 2, allocation unit id: 281474978938880/2228224, task 0×0000000027509048 : 0, waittime 300, flags 0x3a, owning task 0x0000000027EEE748. Continuing to wait.
Looking at the server’s memory usage, in this case via Spotlight for SQL Server, we could see that the SQL Server was allocating huge amounts of memory to the SQL Server process, but it wasn’t actually using this memory for anything it was just allocating it.
To make things more interesting, this problem first started happening after we upgraded the RAM in the server from 256 Gigs of RAM to 1 TB of RAM. While trying to figure out what was happening we could simply reduce the maximum amount of RAM that SQL Server could access to below 256 Gigs of RAM and the problem would just go away.
To make things worse management wouldn’t allow the server to remain broken long enough for any sort of proper diagnosis to be done. So basically we could try a change, and if the problem came back all we could do was set the memory back down to 256 Gigs and wait for the next window to try the next fix.
After a bit of trial and error of different traceflags and settings we found the right set of settings. We turned on traceflag 834 which turns on large page allocations. This traceflag requires that the lock pages in memory setting is enabled, so that was turned on as well. We also turned on AWE within the SQL Server based on this blog post from Microsoft.
After making these changes and setting the max server memory on the server back to 900+ Gigs of RAM and everything began working as expected without the above page latch timeout errors.
Probably the best advise that I could someone entering the field of Database Administration would be to keep learning. If you think that you know everything that their is about this product that we deal with day in and day out called Microsoft SQL Server, you are wrong. There are so many little pieces to learn about how the engine works with data, how statistics work, how memory is managed, how data is read and written, and most importantly how all of these pieces fit together just so to make a SQL Server that runs fast.
Just to make our lives supporting this software called Microsoft SQL Server that much harder Microsoft has decided that they are going to release a new version every 2 years or so. So instead of just having to manage one or two versions like we did back in the SQL 7 and SQL 2000 timeframe, we now have to support 4 or 5 versions (I’ve got clients with SQL 2000 up through SQL 2012, and some will move to SQL 2014 right when it comes out).
Just because SQL Server does the same thing in the new versions (stores data) doesn’t mean that things are different in the new versions. This is especially true in SQL Server 2014. There are a bunch of new features and changes to existing features that will change how some very low level pieces of the database engine, so we are back to reading and learning more so that we can keep up with the changes to the platform.
P.S. This post is part of a series of posts being written by people from all parts of the SQL Server community and was coordinated by John Sansom who will be gathering up all the posts and making them available via a download which I’ll link to when I’ve got the URL.
Today was the 2nd day of the SQL PASS conference, and as always there was a keynote. Todays keynote was by Douglas McDowall who is the VP of Finance, Thomas LaRock who is the VP of Marketing, followed by Dr. David Dewitt from Microsoft.
Douglas talked to the members at large about the financial health of the PASS organization. PASS is proud to now have a $1M rainy day fund so that we can survive though rough years like the years that we had a few years ago.
Bill Grazino took the stage for a few minutes in order to thank out going board members Rushabh Mehta (Immediate Past President), Douglas McDowall (VP of Finance) and Rob Farley (Board Member).
When Tom LaRock took the stage he started by introducing the new members to the board of directors which included Jen Stirrup, Tim Ford and Amy Lewis. For the PASS BA Conference we are moving to San Jose during May 7-9th and for the PASS Summit we will be back in Seattle from November 4-7. Early Bird registration will only be open until mid-December, so if you plan to sign up for the early bird rate, be sure to sign up soon at sqlpass.org.
Dr. David Dewitt took the stage and started his keynote, which this year is all about Hekaton (aka. In Memory OLTP Tables). Hekaton is a memory-optimized yet fully durable high performance OLTP engine which has been integrated into SQL Server 2014. For Hekaton Microsoft did a major rewrite of the core engine of SQL Server to fit this new engine into the existing database engine as they didn’t want to introduce a new Windows service to run this new In Memory database engine.
On of the primary reason that Hekaton is so fast compared to the traditional database engine is because of the lack of latching and locking as they don’t exist when using Hekaton tables. This is the core reason that Hekaton is able to get very close to being 100 times faster than the normal database engine.
Dr. Dewitt explained how latches work for those who aren’t used to looking at latches. Latches are used as a special kind of lock which are used to ensure that a process that is attempting to query for a page which is in the process of being read from the disk has to wait until the page read from the disk has been completed by the storage engine of the database engine.
One question that Dr. Dewitt gets asked a lot, and that I’ve been asked several times as well, is “is Hekaton the new pinned tables”? The answer here is no. Hekaton is a totally different way of processing data. By simply putting tables into memory and pinning the table there you wouldn’t even be able to get a 10X performance improvement which wasn’t enough to justify the time and money which Microsoft wanted to spend while making this product. Hekaton is actually the third query engine in the SQL Server database product. There’s the normal relational engine which we’ve been using for years, the ColumnStore engine (project Apollo) and not the Hekaton engine for these ultrafast database engine.
When building a Hekaton engine you MUST have a PRIMARY KEY defined on the table which is nonclustered and must be HASH or RANGE defined. For v1 there are no blobs and no XML data types available for Hekaton tables, which will hopefully be fixed in the release after SQL Server 2014.
Some of the big changes in the way that Hekaton handles data changes without locking is to change the way that updates are done. The big change here is that when an UPDATE is run the row isn’t actually change. A new row is added to the table and the old row is “expired”. This expiration is done by marking a timestamp of sorts (it’s closer to a transaction ID than a timestamp) as the last transaction which the old version of the row is valid for and when the new version of the row is written it is marked as having a beginning timestamp of the transaction time when the transaction was started. When select statements are accessing rows they use this timestamp to figure out which version of the row they should be looking at. This is a VERY different way of managing changes in the database tables from what we have had until now.
When are older row versions deleted? There’s a garbage collector which runs in the background watching for when there are rows which are expired with an earlier timestamp than any of the current transactions. At that point the old version of the rows can be deleted. This means that there’s no blocking that can possibly happen (if there was blocking in Hekaton tables) while this is happening as there are no transactions which could possibly read these older rows. Based on this is someone was to begin a new transaction and never commit the transaction none of the older rows would be able to be deleted.
I hope that you enjoyed the keynote from Dr. Dewitt as much as I did, and hopefully I’ll see you at the summit next year.
Every once in a while you run across a problem that shouldn’t be causing people as much problems as it is, yet there it is causing all sorts of problems for long enough that being broken is considered normal.
In the case of this one client that I’ve been working with that problem was massive delays in replication between two SQL Server 2008 R2 machines. And by massive delays I’m not talking about a few minutes, but hours, sometimes up to 10-12 hours of delay and it would be like that for days or weeks.
The symptoms in this case were that when you inserted a tracer into the replication publication the bulk of the delay was between the publisher and the distributor. That meant that odds are it was one of three thinks.
- The disk on the distribution database is crap
- There is something very wrong with the transaction log on the published database
- There is a network problem between the publisher and the distributor
Well in this case #3 doesn’t apply as the publisher and the distributor are on the same machine. Looking at the IO the response times were fine, between 0.001 and 0.015 seconds according to perfmon. There weren’t any major waits problems showing up when looking at the waits. That basically removes #1. So by process of elimination that leaves us with #2, something is very wrong with the transaction log.
Upon first look there’s nothing funky going on. Database is in simple recovery. The transaction log is huge, but with all these problems that’s to be expected. Running DBCC LOGINFO showed me some very useful information, when it finely finished (which was my next clue this was the problem). When it finished there was over 24,000 Virtual Log Files (VLFs) within the transaction log file.
Ladies and gentleman we have a winner!
Now fixing this on a machine which has replication which is as pissed off as this machine does isn’t as simple as just shrinking the log file. Because the replication is keeping a massive amount of transaction log space locked up this required some careful planning. The first step is to give the transaction log some nice big VLFs to write data to, but we need to do this somewhere that isn’t going to impact the cleaning up of the existing VLFs. The option here, add another transaction log file to the database. Grow it large enough to handle all the traffic for a day or two and grow it out in 8000 Meg chunks. In this case I grow it out to 56000 Megs in size. Then I left the system to sit there for a day.
When I came back to look at the system the next day replication performance was still awful, but now when I looked at DBCC LOGINFO all the VLFs which were in use were in the new data file. Every VLF in file #2 (the default transaction log file) was marked as status 0. A quick DBCC SHRINKFILE statement then growing the default transaction log file back out to it’s needed size of 56000 Megs and suddenly we were down to ~250 VLFs give or take.
Within about 2 hours replication had completely caught up and written all the transactions not only to the distribution database but had also written all those transactions to the subscriber as well.
Once replication was done I shrank the new transaction log file which I had created as small as it would go so that the next VLF that would be written to was in the default transaction log file. Once all the VLFs in the new transaction log file were back to status 0 I simply deleted the new file.
Now replication is all happy. I’m no longer seeing messages like “The Log Reader Agent is scanning the transaction log for commands to be replicated. Approximately 500000 log records have been scanned in pass # 3, 0 of which were marked for replication, elapsed time n (ms). When I started all this these messages were being thrown several times a second. Now they aren’t appearing at all which is very good.
Replication is happy, the client is happy, and I’m happily working on their next problem.
P.S. While this may not be your problem if you are seeing this sort of issue, it’s VERY much worth checking out.
Recently I had to move the SCCM database for a client of mine. I used this walkthrough to go through the process. For the most part everything worked pretty well. There was one thing which was missing from this. The process requires that the SQL Server 2008 R2 SP1 Feature Pack (or the feature pack for which ever version of SQL Server you had installed before) in the same location that the MSI packages were located in when SCCM was first installed. This means downloading and saving the MSIs in the correct location as needed.
Hopefully if you have to move your SCCM database you won’t spin your wheels like I did.
So I was doing a little performance tuning of a query for a company recently and we were getting hammered on the IO for the query. The query was in a stored procedure which wasn’t all that complex (ignore the crap code for now).
ALTER PROCEDURE [dbo].[iu]
[TimeStamp] = @t
(@p IS NULL OR p = @p)
(@as IS NULL OR [I_ID] IN (
select [I_ID] from [iq]
where [qt] in (select [tn] from [lqt] where [as] = @as)
and not exists (select * from [ia] where [Q_ID] = [iq].[Q_IQ])))
At first glance the plan looked fine. There was a scan on lqt, but that’s OK it’s a really small table. IQ was showing 77315 logical IO with a scan count of 15. So something is very wrong here. Looking at the plan a little deeper we see that while there is an index seek, we are seeking against that index over 6700 times each time the stored procedure is run. That sure isn’t a good thing. The index that we were seeking against was built on the I_ID column which while producing seeks with a ton of other columns included, was producing WAY to many seeks.
Our solution in this case was to build an index on the QT column and include Q_ID and I_ID columns. Once we built this new index the IO dropped from 77315 IO down to just 4368 which is basically the size of the table. The new index is being scanned, but it’s only being scanned once with SQL getting everything that it needs from that one scan.
So just because you are getting seeks in your execution plan that doesn’t mean that you are done tuning the query. You need to look at how many times each operator is being used, and fix any operators that are being executed to many times.
Getting the query a little better took a little code research in the application with the developer, but we were able to get things even better with just a slight code change. Apparently the “@as IS NULL OR ” section isn’t possible any more as the @as parameter will always have a value assigned so this fork of code is in there for no reason. Removing this from the query got the logical reads for that table down to just 30, reduced the worktable IO and lowered the IO on the ia table as well. All in all not bad for a few minutes of tuning and for putting an index scan in place.
This video is sponsored by Denny Cherry & Associates Consulting
Just a reminder that TODAY is my SQL PASS Summit 1st Times Webcast for the 2013 PASS Summit. The webcast is at 1pm PST / 4pm EST. I’ll be using GoToMeeting this year, and the URL for the webcast is https://attendee.gotowebinar.com/register/4906797491661003009 (THIS URL HAS BEEN CHANGED). Just sign in at the meeting time (or just before) and you’ll be all set.
If you’ve never been to the PASS Summit before then this webcast is for you. During this webcast I’ll be covering things like getting from the airport to the hotels, which hotels are nearest the convention center, where things will be inside the convention center, where the good food options are, where the good drinking options are, as well as some special announcements.
If you have been to the PASS Summit before then this webcast is also for you. This year the PASS Summit is in a brand new city, so there’s going to be all sorts of good information in this session about our host city Charlotte, NC.
See you at 1pm.