So I was doing a little performance tuning of a query for a company recently and we were getting hammered on the IO for the query. The query was in a stored procedure which wasn’t all that complex (ignore the crap code for now).
ALTER PROCEDURE [dbo].[iu]
[TimeStamp] = @t
(@p IS NULL OR p = @p)
(@as IS NULL OR [I_ID] IN (
select [I_ID] from [iq]
where [qt] in (select [tn] from [lqt] where [as] = @as)
and not exists (select * from [ia] where [Q_ID] = [iq].[Q_IQ])))
At first glance the plan looked fine. There was a scan on lqt, but that’s OK it’s a really small table. IQ was showing 77315 logical IO with a scan count of 15. So something is very wrong here. Looking at the plan a little deeper we see that while there is an index seek, we are seeking against that index over 6700 times each time the stored procedure is run. That sure isn’t a good thing. The index that we were seeking against was built on the I_ID column which while producing seeks with a ton of other columns included, was producing WAY to many seeks.
Our solution in this case was to build an index on the QT column and include Q_ID and I_ID columns. Once we built this new index the IO dropped from 77315 IO down to just 4368 which is basically the size of the table. The new index is being scanned, but it’s only being scanned once with SQL getting everything that it needs from that one scan.
So just because you are getting seeks in your execution plan that doesn’t mean that you are done tuning the query. You need to look at how many times each operator is being used, and fix any operators that are being executed to many times.
Getting the query a little better took a little code research in the application with the developer, but we were able to get things even better with just a slight code change. Apparently the “@as IS NULL OR ” section isn’t possible any more as the @as parameter will always have a value assigned so this fork of code is in there for no reason. Removing this from the query got the logical reads for that table down to just 30, reduced the worktable IO and lowered the IO on the ia table as well. All in all not bad for a few minutes of tuning and for putting an index scan in place.
This video is sponsored by Denny Cherry & Associates Consulting
Just a reminder that TODAY is my SQL PASS Summit 1st Times Webcast for the 2013 PASS Summit. The webcast is at 1pm PST / 4pm EST. I’ll be using GoToMeeting this year, and the URL for the webcast is https://attendee.gotowebinar.com/register/4906797491661003009 (THIS URL HAS BEEN CHANGED). Just sign in at the meeting time (or just before) and you’ll be all set.
If you’ve never been to the PASS Summit before then this webcast is for you. During this webcast I’ll be covering things like getting from the airport to the hotels, which hotels are nearest the convention center, where things will be inside the convention center, where the good food options are, where the good drinking options are, as well as some special announcements.
If you have been to the PASS Summit before then this webcast is also for you. This year the PASS Summit is in a brand new city, so there’s going to be all sorts of good information in this session about our host city Charlotte, NC.
See you at 1pm.
I end up VPNing into a lot of different companies networks, and the thing that I see that always kills me is when I VPN I’m given an SSL URL to connect to in order to get connected, but before the site appears I get a security warning that the SSL certificate isn’t trusted for that URL.
The entire point of SSL VPNs is that it makes a secure trusted SSL encrypted connection between you and the company network. If the SSL certificate isn’t trusted then there’s no way to know that the certificate was actually issued to protect this network. In other words there’s no way to be sure that there’s no man in the middle attack going on where all the network traffic to the VPN is being sniffed.
So I beg, I implore you … if you are going to use being some sort of SSL VPN spend the couple hundred bucks and get a proper SSL certificate for it so that you know you can trust the connection.
It’s a little late in coming, but I’m happy to announce that I’ll be presenting my SQL PASS 1st Timers webcast for the 3rd year in a row. The webcast will be via GoToMeeting this year. The webcast will be on September 25th, 2013 at 1PM PST / 4PM EST. Just fire up the old web browser at that date and time and go to https://global.gotomeeting.com/meeting/join/437424021 to get into the meeting.
I’m sure that someone will ask, so I’ll throw it out there now. The session will be recorded in case you can’t make it.
During the session some of the stuff that we will be going over includes:
- Where things will be laid out in the convention center
- Where the food in the area will be
- Where the drinks in the area will be
- Getting to/from the airport
- Some tourist stuff to do in town if you’ve got the time
- SQL Karaoke announcements (like always)
- Some other great stuff that’s happening this year
Now if you’ve been to the SQL PASS Summit before that doesn’t mean you can skip this session this year. We are in a new city for the first time in years, so it’s a whole new conference for us this year.
I’ll see everyone on the 25th.
Recently I had the privilege of traveling to Poland for SQL Day 2013. As always the people running the conference threw a great event. For this trip I got everything booked through the conference’s travel agent to setup the flights. This is a mistake that I will not be making again. When the travel agent booked the flight it was booked via the Polish airline called LOT. They serviced it out to Lufthansa, who purchased tickets on United for me. Not in theory this should have all gone just fine as they are all Star Alliance partners. However everything completely fell apart.
As we were about to leave from Los Angeles, CA to head to Europe (via Chicago) all of my wife’s Lufthansa tickets were canceled so United couldn’t issue tickets. We were then told to contact the travel agent and have them look into it. Slight problem with this plan, the travel agent is in Poland.
So we called the United 1K desk (for people who fly way to much like I do) and had them look at everything. According to them everything looked fine, except that by now we had missed our first flight. United told us to go to Lufthansa and have them issue new flights. Thankfully Lufthansa flies from Los Angeles so we simply walked over to Lufthansa in the International terminal and talked to them. They said that everything was still screwed up and that we needed to talk to United. United then said to call the travel agent. We got ahold of the travel agent who said that everything looked correct.
Eventually between United and Lufthansa we got tickets issued for a flight 12 hours after we were supposed to leave. This only took 4-5 hours on the phone with United (and at least two United employees) and the help of two Lufthansa employees at the Lufthansa desk at LAX (and several Lufthansa employees that were called by United).
So we make our way to Poland and enjoy our time at the conference.
Friday afternoon we check out flights (we flew out on Sunday) and everything was totally screwed up even further. I was taking different flights then I was booked on before, and Kris only had a flight from Poland to Germany and no flight home to the US from Germany. Needless to say this was a problem.
So I called United and was told that because it was booked through a travel agent I needed to go through them. So we called the travel agent again and was told that everything was fine.
I called United back, again, and was again told to call the travel agent. At this point I was pretty pissed (is there a polite way to say that I lost my shit?). I told the poor soul at United that I’d spent close to 10 hours on the phone with United for just this trip, and that if he couldn’t fix it to find me someone who could. We passed me off to a supervisor who reviewed everything (this was the second supervisor that I’d spoken with for this trip, the first being when getting the mess in Los Angeles fixed). She saw all sorts of problems and immediately began making phone calls to Lufthansa and Lot to get it fixed. And thankfully she didn’t take no for an answer but instead explained the situation to the staff at those airlines and was able to get us booked on flights home, the same flights home at that.
All in all if this hadn’t been booked through a travel agent I would have ended up saving myself probably a good couple of hours of running around trying to call different people.
Personally I’ve never seen the point of using a travel agent to book flights and hotels before. I can easily enough get access to all the information for all the airlines and hotels just by looking at their websites and making all the needed reservations online. I get that in the past a travel agent would have been handy as there was no Internet to use all this with, but in the modern day I can do things faster and easier by my self with the “help” of any travel agents. And in this case if I had gone with what the travel agent had said, I never would have left Los Angeles, and if I had my wife Kris never would have made it back home.
Let my nightmare be a lesson to you. Don’t use a travel agent. Book directly with the airline and be done with it.
It isn’t like this is the first time that a travel agency has screwed me either.
That’s right, next month I’m giving a post con at the SQL/Dev Connections conference. The post con is titled “SQL Server 2012 in a Highly Available World”. In this full day session we will look at all the high availability options which are available to you with SQL Server 2012 (and below) as well as some of the changes coming in SQL Server 2014.
If you haven’t signed up for SQL Connections yet I’d really recommend it. There’s a great group of speakers who will all be presenting top tier sessions and you can quickly add on both a pre-con as well as a post-con to get the maximum amount of training possible in one day.
I look forward to seeing you in Vegas in September.
SQL Server has three different storage terms which you need to know when dealing with physical file storage. These are files, file groups and disks. While each one of these is totally different, then work together giving the SQL Server database engine a very flexible way to define where the data is actually stored.
Disks are the physical storage within the server. If you look at the front of the server they will be the little devices which are inserted into the front of the server and typically have lights on them that flash. In larger environments where there are lots of servers you may have a storage array (also called a SAN). In it’s simplest terms the storage array is a bunch of disks that are in one large box. The space on these small disks is put together in various ways and parts of this space are allocated to specific servers. While the storage array may have anywhere from a couple of terabytes to thousands of terabytes the specific servers will only have access to a very small percentage of this space. In either case, either using local disks in the front of the server or a large storage array the concept is the same as far as SQL Server is concerned. The storage that is connected to the server is presented as a drive letter in Windows.
If you were to log onto the console of the server you would see an computer which would look very familiar to you. This is because the computer running SQL Server in the server room or data center is running Windows just like the computer which is on your desk and just like the computer at your home (assuming that you aren’t using Linux or a Mac). If you opened the My Computer on a server you would see hard drives just like you do on your home computer. On the server there are probably more hard drives than your workstation or home computer and the drives on the server probably have a lot more space than the ones on your workstation, but the idea is still very much the same. Any files that are needed are written to the disks and then accessed by users or by programs, with Microsoft SQL Server simply being a large complex program.
The files which the SQL Server database uses are called MDF, NDF and LDF files (for the most part). These are physical files which sit inside the disks (see above). A database will have at least two files, one of which is the MDF file which holds all of the data and the second which is the LDF file which holds the transaction log. In it’s simplest form the transaction log is a complete record of all the changes which have happened to the database since the database was first created. We manage the size of the transaction log through backups (which I’ll talk about in a later post) so for now we’ll just say that the transaction log holds the record of everything that has changed within the MDF data file. If a row is added that is written to the transaction log first. If a row is deleted that change is written to the transaction log before it is actually deleted from the database.
The NDF database files are simply additional data files. While the database must have one MDF and one LDF data file the use of NDF data files is optional. There is no performance benefit to using NDF data files instead of just a MDF data file (so this is only the case 99.9% of the time but that’s good enough for this post). The only time you’ll get a performance benefit from using NDF data files is if the MDF data file is on one physical hard drive and the NDF is on a different physical hard drive.
Each specific file exists on one and only one hard drive. If you need to store the data on multiple hard drives then you’ll need multiple data files.
File groups are simply logical groupings of files. We use file groups so that specific tables can be larger than any one specific hard drive. If we didn’t have file groups when we created a table and we told the SQL Server where to store that table we would have to tell it which physical data file to store the data in. This would mean that the tables would be limited by the size of the largest disk which was available. Because we want to be able to spread our data across multiple physical files we have file groups. The file groups allow for multiple files to be put into a single group. SQL Server will then create the table within that file group putting portions of the database on all the files within the file group. This allows us to have tables which are very large as we are only limited by the number of disks we can attach to the server and the size of those disks.
When we create a new table (or index, or service broker queue, or any other object which is a physical object which holds data) we specify which file group we want to create the object within. This allows us to leverage any files which are members of the file group. If there are multiple files the object will be spread over all the physical files. If there is just one file today when we create the table but then we add another file to the file group later there’s nothing that we need to do. The SQL Server will automatically begin using the new file as well as the old file. When you add a new file to a file group SQL Server doesn’t rebalance the data across the database files, you have to do this manually by rebuilding all the indexes, but new rows would be written to the new file.
When SQL Server is writing data to the files within the file group it uses something called proportional fill. This means that the SQL Server will do it’s best to keep the amount of free space within the data files the same. While it isn’t perfect at doing there as there are a lot of things that can effect it’s ability to do this, the SQL Server will do it’s best.
Hopefully this helped explain these three complimentary yet different concepts.
The recovery models tell the SQL Server how much transaction log data to keep within the transaction log, and how much data recovery there is. SQL Server has three different recovery models which are SIMPLE, BULK_LOGGED and FULL.
Simple Recovery Model
Simple recovery model is the most basic recovery model and it provides the least amount of protection. The simple recovery model supports only full and differential backups meaning that you can only restore the database to the state that the database was in when either a full backup or differential backup was restored. Typically the simple recovery model is only used for databases like data warehouses or other databases when the data within the database can be reloaded from another source easily. The reason that this is acceptable for data warehouses is because data warehouses are not the production system of record for the data as they are loaded from the production system of record on some schedule, either nightly, weekly, hourly, etc.
Many people that assume that using the snapshot recovery model means that the transaction log isn’t kept. This is not the case. For most transactions the normal transaction log information is written to the transaction log. The different between the simple recovery model and the other recovery models is that with the simple recovery model the SQL Server database doesn’t wait for the transaction log to be backed up before removing the data from the transaction log. Instead it waits for the data pages in the MDF and NDF files to be fully written to disk. Once that happens the SQL Server will mark the rows within the transaction log as being able to be reused.
Full Recovery Model
In the full recovery model the database engine keeps all the transactions within the transaction log file (the LDF file) until they are backed up to a transaction log backup. With the full recovery model because we are taking transaction log backups we are able to restore to any moment in time so long as that moment in time started after the database was created and is before now and that we still have the backups for that period of time.
If for example we take full backups on Monday morning at midnight and transaction log backups every hour at 10 minutes passed the hour, in the event of a database crash we can restore the SQL Server database to any point in time provided that we still have the full backup from the Monday before and all the transaction log backups taken between that full backup up to the transaction log backup which was taken after the problem happened. With this example if we wanted to restore the database to Tuesday afternoon at 1:45:00pm we could do that by restoring the full backup from Monday then restoring the transaction log backups starting at ten minutes after midnight on Monday up through the 2:10:00pm backup on Tuesday. While restoring the final transaction log backup we would simply tell the restore database command that we wanted to stop the restore process at 1:45:00pm on that Tuesday and the restore would simply stop at that point.
Bulk Logged Recovery Model
The bulk logged recovery model is very similar to the full recovery model. Most of the transactions are fully logged within the database’s transaction log. There are a few commands which aren’t such as bulk inserts using the BCP command line tool or the BULK INSERT T-SQL statement as well as a few others. All normal INSERT, UPDATE and DELETE statements are fully logged just like in the full recovery model. The bulk logged recovery model allows for point in time restores just like the full recovery model does. The difference here is that when there is a bulk operation which is bulk logged using the bulk logged recovery model not all the data changes are logged, only the allocation of space within the data files is logged. This presents a problem when the transaction log backup is being taken as the SQL Server needs to backup the records. In order to do this because we don’t have the full transaction log information for these commands the SQL Server database engine simply copies the data pages which were allocated to the transaction log file. Because of this the specific times that the database can be restored to will be limited to times when the bulk operations were not running. So if the bulk operation runs from midnight until 1am every night there wouldn’t be any way to restore to any specific point in time during that one hour window. If you need the ability to restore to specific times within that window and have the data be perfect the full recovery model must be used.
Statistics are magical little objects within the database engine that have the ability to make your queries run fast or painfully slow. The reason that statistics are so important is because they tell the database engine what data exists within the database table and how much data exists. The problem with statistics comes from how often they are updated. By default in all versions and editions of SQL Server the statistics are updated when 20%+500 rows within the database table change. So if a database table has 10000 rows in it we need to change 2500 rows (2000 rows is 20% plus an additional 500 rows) for the statistics to be updated. With smaller tables like this having out of date statistics usually doesn’t cause to many problems. The problems really come into play with larger tables. For example if there are 50,000,000 rows in a table for the statistics to be automatically updated we would need to change 10,000,500 rows. Odds are it is going to take quite a while to change this number of rows. To fix this we can manually tell the SQL Server to update the statistics by using the UPDATE STATISTICS command.
Within the statistic there are up to 200 values which are sampled from the column. The statistic shown below contains a few different columns. The statistic shows a series of values from the column which the statistic is built on. It also contains the count of the number of rows between that row and the next in the statistic. From this information the SQL Server is able to build the execution plan which is used to access the data. When the data within the statistic is out of date the SQL Server doesn’t make the correct assumptions about how much data there is and what the best way to access that data is. When the statistic gets updated the SQL Server is able to make better assumptions so the execution plan becomes better so the SQL Server is able to get the data faster.