It’s amazing just how much difference having a non-clustered index on the child table of a foreign key can matter when the foreign keys have cascading deletes turned on. In the example that I’m thinking of a new table was added when the application was upgraded the week before. Then Monday morning there were all sorts of blocking and deadlock problems when trying to delete data from a large table with 2.4B rows in it. The problem table wasn’t the big table, but instead the new table with all of 455k rows in it when I looked at it.
Looking at the execution plan for the stored procedure which does data deletion from the large table and the problem become painfully clear. There was a clustered index scan on the new table for each row which was deleted. That clustered index scan was part of the deletion transaction so anyone trying to insert into that little table was being blocked. So basically a nice chain reaction was happening.
This was easily shown in the CPU workload as well shown below. You can see the CPU workload on the system spike to above 40% to almost 50%, which is a lot on a system which has 80 logical CPUs and normally runs at about 11% CPU workload.
In the graphic you can see exactly there I added the indexes as the workload drops right at 11:50AM.
This just goes to show that if you plan on using foreign keys to handle data deletion between tables you need to ensure that the column which is the child tables foreign key has an index on it.
We have a variety of options when it comes to compression and encryption in SQL Server. When using both compression and encryption you have to understand how each of these work and when they wil lwork together and when they won’t be able to work together to make using both technologies useful.
The trick to making compression and encryption work together is to ensure that the compression is done first and the data encryption is done second. This is most easily done by using TDE for compression and page level compression for data compression. This is becaues when using these two technologies, no matter in which order you have enabled them, SQL Server will compress the data first and encrypt the data second. This even happens if you have a database which is encrypted with TDE and you then enable data compression on the tables. This is because when the data is compressed it is rewritten as compressed data and then encrypted post compression.
Using application level encryption you can still compress data using native data compression feature of SQL Server, however the amount of data compression that you will typically get in this situation will be much less than by using TDE and data compression. The same applies if you use TDE and then backup the database using native (or third party) backup compression. This is because when backups of a TDE encrypted database are taken the database pages are not decrypted when backed up. They are backed up in the same encrypted state that they are normally in, then compressed. By it’s nature encrypted data is very unique so data compression doesn’t do much good against encrypted data.
Tomorrow (December 12th) at noon Pacific Time I’m presenting a session with SIOS Technologies on using their SANLess Clustering Technology to build a clustered SQL Server for High Availability within Amazon’s EC2 cloud. This session is open to the public and is priced just right … FREE!
So get signed up and learn more about how to setup a traditional Windows Cluster in the Amazon EC2 cloud for high availability within the EC2 cloud for your cloud based applications.
I’m very pleased to announce that I will be presenting a precon at the Albuquerque SQL Saturday at SQL Saturday 271 on January 24th, 2014. At this precon I’ll be presenting a session titled “SQL Performance Tuning and Optimization”. This session is a full day session where we will look at the various ways to find and troubleshoot performance tuning problems in SQL Server today. New content in this session will include knowing when to look at the new SQL Server 2014 features such as Hekaton and ColumnStore (Apollo).
Recently I was building a new Windows 2012 cluster that was going to hold a SQL 2012 instance. So far so good. Something that I’ve done dozens of times. However this time something strange happened. As the SQL instance was being installed I got an error message saying that the cluster couldn’t be brought online. Looking at the cluster manager I saw that the network name for the SQL instance had failed. Now the big difference between this cluster and all the other ones that I’ve installed was that the domain was still a Windows 2003 domain and the forest level was still Windows 2003 as well.
Wanting to make sure that it wasn’t a SQL Server problem I tried creating a new role on the cluster and giving it a client access point. When that tried to come online it failed as well, so we now know that we don’t have a SQL problem but are having a Windows problem, so we can ignore the SQL Installer for now.
Looking at the cluster log, which I exported via the get-clusterlog PowerShell commandlet wasn’t much help either. It did tell me that we were getting error 1326 when we tried to authenticate against the domain. That being a username and password failure.
That’s a little odd given that Windows creates these passwords for me when the access point it created. You can attempt to fix this by right clicking on the failed object in failover manager, selecting “More Actions” and selecting repair. In this case that didn’t work either.
After bouncing around the web for a bit and talking to some people at Microsoft in the Clustering product team we found hotfix 2838043 which is titled “Can’t access a resource that is hosted on a Windows Server 2012-based failover cluster“. This fixed the ability for the cluster name that I manually created to work correctly, however the SQL Installer still failed.
To get the SQL installer to work I had to cancel out of the installer. Manually delete the Active Directory account for the computer and manually delete the DNS entry. Then reboot the cluster node and run through the installer again. At this point it worked without issue.
So if you are planning on installing a Windows 2012 cluster on a Windows 2003 domain, make sure that you’ve got this hotfix installed on the Windows 2012 cluster nodes before you begin.
A question came up that during my 24 Hours of PASS presentation a while back. The question was “Is performance of SQL Server 2012 is better in Virtual environment than the Physical?”. Now I actually get questions like this all the time when giving presentations.
Thankfully for me, this is one of those times where the answer is very straight forward. Performance on a virtual machine won’t ever be better than when using like physical hardware. The reason for this is that the hypervisor will add a small amount of performance overhead to the virtual machine. In modern hypervisors this overhead is typically very small, usually 1-2% but that does mean that the physical server will run a little bit better.
A perfectly normal follow up question would be, why should be bother setting up SQL Servers as virtual machines? The answer to this isn’t as easy to answer. If your SQL Server, and more specifically the application and your users can live with the slight performance hit that you get by being within a virtual machine then keep virtualizing those SQL Servers. However if you have an application which is very sensitive to performance problems, virtualizing the database for that application probably isn’t the best of ideas.
Hopefully this helps dispel some myths.
Recently a friend was working on one of his clients SQL Servers and he ran into an interesting problem. The hardware in question was a HP DL 580 with four chips, each with 10 cores, with hyper threading enabled. This should have presented to SQL Server as 80 total logical cores. The problem was that SQL Server was only seeing 40 cores. The server in this case was Windows Server 2008 R2 and the SQL Server was SQL Server 2008.
If you are familiar with SQL Server 2012 you may be thinking that this is done licensing limitation, but you would be wrong. The problem here is a NUMA problem.
The reason that the problem comes up (which I’ll cover before giving you the solution) because of the way NUMA works on large systems. We all know that we have NUMA nodes, which on todays servers are basically one physical CPU socket per NUMA node. NUMA nodes are put into groups when there are lots of logical processors. The thing is that a single NUMA group can only contain 64 logical processors. Looking back at our HP DL 580, we have 40 cores with hyper threading, which is 80 logical processors. That means that we need to have two NUMA groups.
This is no problem on Windows 2008 R2 as it supports NUMA groups (Windows 2008 and below do not). However the problem is that SQL Server 2008 doesn’t support NUMA groups, so it can only see the logical processors that are in NUMA group 0 (you can have up to 4 NUMA groups which are numbered from 0 to 3).
Because of this the SQL Server was only able to see 40 cores, and those 40 cores were the physical and logical cores from CPUs 0 and 1. We could see this in the errorlog file because it only showed CPUs 0 and 1. Why doesn’t SQL Server just use the physical cores from all the processors and ignore the hyperthreaded cores? Well that’s because it has no idea that CPUs 2 and 3 exist because it can’t see them over in NUMA Group 1.
When running the workload on this machine, 1/2 of the physical CPU power just isn’t being used.
Why do we are about this? Well the 1st problem is that accessing the memory that is attached to CPUs 2 and 3 is going to be expensive as that memory is in another NUMA node than the CPU that’s doing the work. That’ll slow things down with all the cross NUMA node requests. The other problem here is that under a heavy CPU workload the SQL Server will be using 20 real CPUs and 20 virtual CPUs. It would be much better to have access to all the physical CPU cores.
The solution here was quite simple once we realized what the problem was. Disable hyper threading. Now SQL Server can still only see 40 logical processors, but it’s getting all 40 physical cores on the server. This means that cross NUMA node memory access should be mostly gone and we’ve got all the CPU power that we paid for available to us.
As our servers get larger and larger we’ll have more and more cases of older versions of SQL Server not being able to see all the CPU power, and this is why. The number of logical processors that SQL Server could see really depends on the physical server config and how many cores each physical processor has. The basic idea behind the problem is that not all the cores are showing to the SQL Server.
Blackberry was back in the news yesterday and today with the news that they have called off the search for a buyer and are instead looking for a new CEO. According to the news reports that I’ve read there were talks of other companies thinking about buying Blackberry.
My big question to this is simple.
Buying Blackberry as best as I can figure out comes with a massive cash requirement and not much to show for it. If you buy the entire Blackberry company you get the joy of paying to maintain the blackberry service which every Blackberry out there requires so that they can still use the data network. This is because for any Blackberry to send and receive data the phone has to be able to talk to the Blackberry servers which are run out of Blackberry’s data center in Canada. Running these servers will also require staff, and IT staff isn’t free.
Of course you’ll also get the Blackberry inventory of phones that haven’t been sold, most of which probably won’t be because the large majority of Blackberry users are corporate customers, and companies don’t upgrade their employee’s cell phones all that often. And when they do, odds are they won’t be purchasing new Blackberry phones for them.
So what would a company get if they were to purchase Blackberry? They’d get some pretty smart engineers, some good software and hardware patents, and that’s probably about it. Everything else that they’d get is a massive cost with little to no return.
In the article I linked to above it says that the companies largest shareholder is going to dump $1B into Blackberry to try and reboot the brand. Frankly I fail to see why. Blackberry was a major player for a long time, but they made the ultimate mistake, they failed to keep up with market trends. And because of that their loyal following of customers have mostly left. Hell I myself was a loyal Blackberry user for over 10 years. But when they released their crappy 1st and 2nd attempt at a touch screen and their was no change to the slow startup speed of their phones I dropped them for an Android phone and I’ve never looked back. And frankly I’m glad that I did.
If Blackberry (or RIM or whatever name you want to refer to them by) had made some different decisions a few years ago they would still be a major force to be reckoned with. But today they are a joke in the consumer market, and they are loosing more and more of the corporate market every month.
I’m sure that this dumping of cash into Blackberry by their investor is an attempt to try and control the situation so that they can get some return on their prior investment. But I’ve got bad news for them. At this point I’m pretty sure that they are just throwing away money. Their best bet at this point is probably to inform the users that they are going to shutter the company in X months giving the users time to get off the Blackberry phones. Then auction off the patents to the highest bidder (these guys were doing two way devices way before anyone else was, I’m sure there’s some good patents in there), and sell the rest for scrap. This wouldn’t be pretty or popular, but I’m pretty sure that it’s the best business decision that they could possible make.
Sadly I’m pretty sure that they won’t take my advise, so we’ll have to see if I’m right or wrong.
Recently I had an interesting problem where the SQL Server 2008 R2 instance would randomly in the middle of the morning start having latch timeouts on various tempdb database pages. The first assumption was that these pages were GAM pages and that more tempdb database files would solve this problem. However looking at these pages, these weren’t GAM pages but instead were normal data pages.
2013-07-27 08:29:31.50 spid745 Time out occurred while waiting for buffer latch — type 2, bp 00000000F6FBA400, page 103:8, stat 0xc0000b, database id: 2, allocation unit id: 281474978938880/2228224, task 0x00000000056D5708 : 0, waittime 300, flags 0x3a, owning task 0x0000000027EEE748. Continuing to wait.
2013-07-27 08:29:31.54 spid1107 Time out occurred while waiting for buffer latch — type 2, bp 00000000F6FBA400, page 103:8, stat 0xc0000b, database id: 2, allocation unit id: 281474978938880/2228224, task 0x0000000026EA8988 : 0, waittime 300, flags 0x3a, owning task 0x0000000027EEE748. Continuing to wait.
2013-07-27 08:29:31.56 spid672 Time out occurred while waiting for buffer latch — type 2, bp 00000000F6FBA400, page 103:8, stat 0xc0000b, database id: 2, allocation unit id: 281474978938880/2228224, task 0x00000000272154C8 : 0, waittime 300, flags 0x3a, owning task 0x0000000027EEE748. Continuing to wait.
2013-07-27 08:29:31.65 spid1919 Time out occurred while waiting for buffer latch — type 2, bp 00000000F6FBA400, page 103:8, stat 0xc0000b, database id: 2, allocation unit id: 281474978938880/2228224, task 0x000000265E681048 : 0, waittime 300, flags 0x3a, owning task 0x0000000027EEE748. Continuing to wait.
2013-07-27 08:29:32.05 spid819 Time out occurred while waiting for buffer latch — type 2, bp 00000000F6FBA400, page 103:8, stat 0xc0000b, database id: 2, allocation unit id: 281474978938880/2228224, task 0x0000000027509048 : 0, waittime 300, flags 0x3a, owning task 0x0000000027EEE748. Continuing to wait.
Looking at the server’s memory usage, in this case via Spotlight for SQL Server, we could see that the SQL Server was allocating huge amounts of memory to the SQL Server process, but it wasn’t actually using this memory for anything it was just allocating it.
To make things more interesting, this problem first started happening after we upgraded the RAM in the server from 256 Gigs of RAM to 1 TB of RAM. While trying to figure out what was happening we could simply reduce the maximum amount of RAM that SQL Server could access to below 256 Gigs of RAM and the problem would just go away.
To make things worse management wouldn’t allow the server to remain broken long enough for any sort of proper diagnosis to be done. So basically we could try a change, and if the problem came back all we could do was set the memory back down to 256 Gigs and wait for the next window to try the next fix.
After a bit of trial and error of different traceflags and settings we found the right set of settings. We turned on traceflag 834 which turns on large page allocations. This traceflag requires that the lock pages in memory setting is enabled, so that was turned on as well. We also turned on AWE within the SQL Server based on this blog post from Microsoft.
After making these changes and setting the max server memory on the server back to 900+ Gigs of RAM and everything began working as expected without the above page latch timeout errors.
Probably the best advise that I could someone entering the field of Database Administration would be to keep learning. If you think that you know everything that their is about this product that we deal with day in and day out called Microsoft SQL Server, you are wrong. There are so many little pieces to learn about how the engine works with data, how statistics work, how memory is managed, how data is read and written, and most importantly how all of these pieces fit together just so to make a SQL Server that runs fast.
Just to make our lives supporting this software called Microsoft SQL Server that much harder Microsoft has decided that they are going to release a new version every 2 years or so. So instead of just having to manage one or two versions like we did back in the SQL 7 and SQL 2000 timeframe, we now have to support 4 or 5 versions (I’ve got clients with SQL 2000 up through SQL 2012, and some will move to SQL 2014 right when it comes out).
Just because SQL Server does the same thing in the new versions (stores data) doesn’t mean that things are different in the new versions. This is especially true in SQL Server 2014. There are a bunch of new features and changes to existing features that will change how some very low level pieces of the database engine, so we are back to reading and learning more so that we can keep up with the changes to the platform.
P.S. This post is part of a series of posts being written by people from all parts of the SQL Server community and was coordinated by John Sansom who will be gathering up all the posts and making them available via a download which I’ll link to when I’ve got the URL.