I’ve submitted a request to Microsoft (via the Connect site) to have them improve the error reporting which the Service Broker does to the error log. If everyone could vote for that submission (https://connect.microsoft.com/SQLServer/feedback/ViewFeedback.aspx?FeedbackID=304785) it would go a long way towards getting it included in a future build of SQL Server.
While waiting for an index to be created I ran across this little beauty over at http://xkcd.com. I just had to share it.
Many DBAs so not have the knowledge needed in the storage area to be able to ask the right questions to there storage engineers, and often storage engineers do not know the right questions to ask a DBA to get the answers that they need. Because of this database storage ends up being configured correctly for a file server, but not for a database. This can lead to slow response time within the database, which can lead to blocking and eventually timeouts on the front end.
Correctly configured and sized storage is essential to running a database at peek performance. Unfortunately when configuring storage even the smallest misconfiguration can be detrimental to the entire database and it’s front end application.
As some initial reading please take a look at this tip I wrote for SearchSQLServer.com called Optimize disk configuration in SQL Server. There is also an excellent tip by Hilary Cotter called SAN considerations for your SQL Server environment.
Today I’ll be focusing on databases which use SAN storage, but some of it will still apply to databases using local storage.
When configuring your storage be sure to look beyond just the storage capacity of the drives. You also need to consider the Input/Output operations (IOs) that each drive and array can support. If you have enough space, but not enough IOs your database will not work at the speed you need. With drive sizes growing as quickly as they are, it is very easy to get into trouble and not have enough IO capacity. If you have a database which needs 500 Gigs of Storage, you could fit that database onto a single 750 Gig drive. Would that hard drive have enough speed to handle the IO requirements of your application, probably not.
Unfortunately there is no good tool available to help you figure out how many IOs you will need in order to support your database. I recommend planning high, but not crazy high. If you are expecting 10000 transactions per hour you do not need to be able to support 10 million transactions per hour, however the number of transactions and IOs which are needed will change over time as the business grows. Because of this it is recommended to routinely evaluate the IO requirements for the database and ensure that you can meet them. If not, it may be time for an upgrade. Before you go requesting budget for an upgrade do not forget to tune your indexes. As tables grow index and statistic usage can change and may need to be reviewed before making changes to the storage.
While there is a correlation between transactions and IOs, there is no direct formula from one to the other. Some transactions may work completely from the buffer cache and cause no transactions, while some may cause tens of thousands of reads while exporting data for loading into your data warehouse.
When working in a SAN environment be sure to take into account not just the requirements of your database, but also of the other LUNs which will be sharing your RAID group. A classic example is where there is a RAID 0+1 RAID group made up of 4 disks, each with the potential of handling 120 IOs per second for a total of 240 IOs per second. There are three databases which need storage on this RAID Group. Each has a requirement of 100 IOs per second. But because they are created at different times the IO requirements aren’t being tracked very well. After the second LUN is created everything should still continue to work fine. When the SAN administrator gets in his request for a third LUN to be created he looks at his RAID group and sees that there is plenty of space on the RAID Group. With no IO information from the DBA he doesn’t know that this LUN will overload the RAID Group and cause a slowdown of all three LUNs. Because the DBA doesn’t know that the three databases are all on the same RAID group he doesn’t know that there will be a problem either. But as soon as the third database goes into production and starts getting a load all three applications start to feel the slowdown.
Because the proper questions were not asked by either the DBA or the SAN administrator we now have applications which are not behaving correctly.
One question that I am often asked is “How many IOs can a disk actually handle?”. Unfortunately there is no good answer to that question. Each drive vendor will be slightly different. Something which will also effect the answer to this question is what is called the IO profile of the drive. Sequential IO operations (reading Sector 1, then Sector 2, then Sector 3, etc) is much faster than random IO operations (reading Sector 158, then Sector 21, then Sector 851, etc). Unfortunately most database traffic is random by nature. OLTP applications involve systems accessing small bits of the database all over the physical file. SQL will try to correct for this by caching as much data as it can into memory so that it doesn’t have to go to the disk very often, but most disk access will still be very random in nature. Operations such as OLAP data loads and OLTP table scans are typically sequential, providing that your tables are laid out correctly in separate file groups.
There are other things which can effect the randomness of your disk access such as table scans, low buffer cache hit ratios, data file location, page file location, other applications using SQL Servers disks, etc.