So today is #TSQL2sDay, and this months topic is Disk IO. Storage is something I love working in, so I figured why not, I’ll post something.
Hopefully everyone knows that your storage solution is the most critical when it comes to keeping your database up and running day after day. If your databases are two slow, then your database will be slow and there just isn’t anything you can do about that besides adding more disks to the storage system.
However figuring out that the problem is actually slow disks can be a problem in itself, especially if you work in a large company and don’t have access to the servers themselves or the storage array that is hosting the databases.
Within SQL Server you have a couple of places you can look. The easiest is to look at the sys.dm_io_pending_io_requests DMV. If you have a lot of rows being returned that say that the IO is pending then you may have a problem as SQL may be trying to push more IO than your storage solution can handle.
Another DMV you can look at is the sys.dm_io_virtual_file_stats DMV. This will tell you how many IO requests have been processed since the instance was started, as well as how many of those requests were stalled when writing them. Using these numbers requires doing some math to see how your doing over the current runtime of the instance. As the instance has been up for longer and longer these numbers can get harder and harder to make sense of.
Within Windows you’ll be using our good friend Performance Monitor to see what’s going on. There’s a few counters which are really critical to looking at. They include the Reads and Writes per second, the seconds per read and write, and the queuing counters.
The reads and writes per second will tell you how many requests are going between the server and the disks per second. If these numbers are very high and stay there they you are pushing the disks very hard. If this is the case, stop here, and make sure that you don’t have any indexes that need to be built, and that all your statistics are up to date.
The seconds per read and write are very critical numbers. This tells us how fast the disks are processing each read and write request. These numbers should be very very low, somewhere in the .00n range with the smaller the number the better. If you are seeing numbers which are higher than .010 then you may be pushing your disks to hard. Anything over 1 second and your SQL Server is probably dying for more disks.
The disk queuing numbers are also very important. These will tell you how many commands are backing up while the disks are processing the other commands which were given to them. The general number is that the queue shouldn’t ever reach n*2 the number of disks that you have which are actively serving the data. So if you have a 10 disk RAID 10 array, the queue should go no higher than 10 as there are only 5 disks serving the data, but those same 10 disks in a RAID 5 array are OK with a disk queue of about 18 as there are 9 disks actively serving the data.
Now this doesn’t mean that you should always have a disk queue. Disks work best when data is sent to or read from them is bursts, not in a constant massive stream. This means that you want to aim for an average queue length of 0, with occasional spikes up.
On the array
If you are working with your basic local disk array, then there isn’t much you’ll be able to look at past the server unless your RAID card has metrics which it can expose to you through the diagnostic tools which come with it.
However if you are working in with a SAN solution your SAN will have some diagnostics available to your SAN admin. These diagnostic numbers will give you the full story as you’ll be able to see what Windows sees from the Server, as well as what the array is seeing.
When looking at the array itself you can now see not only what the performance on the LUN which is presented to Windows, but what the performance of each specific disk which is under the LUN is doing. This will allow you to for example see if a specific spindle under the LUN is causing the slow down, perhaps because it is failing.
Getting the full picture is very important when it comes to looking at storage performance issues. This means looking at the performance numbers from all sides so that you can get a full understanding on exactly where the performance problem may be coming from.
I forgot to put in the link to Mike’s post about T-SQL Tuesday, so here it is.
Every once and a while you have to kill a SPID in SQL Server. And on a rare occasion the SPID will rollback, but won’t actually rollback and go away. While this is annoying there isn’t actually anything bad going on. The SQL Server is running just fine, however you won’t be able to kill this SPID without restarting the SQL instance.
Typically when I’ve seen this the client application has been disconnected from the SQL Server. From what I understand is happening is:
- The SPID is killed
- The SQL Server rolls back the transaction
- The client is informed of the rollback
- The client acknowledges that the rollback is complete
- SQL terminates the SPID
Every time that I’ve seen this on my servers the client has already disconnected, do to a reboot, network drop, client crash, etc which stops the SQL Server from telling the client that the rollback is complete. This breaks something within step 3 and 4 leaving the process sitting there.
The upside to this problem is that the rollback is complete and the transaction has been completely rolled back and closed so it isn’t holding any locks. The downside is that you’ll need restart the SQL Instance in order get rid of the process. Killing the process won’t do anything for you as it will only tell you that there are 0 seconds remaining and that the rollback is at 0%.
If you have one of these processes show up on you and you have to leave it for a day or two until you can restart the instance there shouldn’t be any harm in this as the process is idle. It is using up a small amount of memory, but once the rollback has completed it isn’t using any CPU or memory. Upon restart of the instance it won’t add any time to the instance restart as the transaction has been rolled back so it’ll come back online quickly.
I’ve just published a new article up on SearchSQLServer.com where I’m talking about how to setup the new Multi-Server Management feature of SQL Server 2008 R2 complete with some screen shots in case you haven’t had a chance to look at it yet.
I logged into my WordPress admin site and saw something that I never thought that I’d see. Somehow I’ve managed to publish just over 400 blog posts.
I’m really not sure how that happened, but I’d just like to thank everyone who reads my ramblings either so that you can get some new info, or for some sort of entertainment value, or whatever. Just thanks for reading and I hope that you’ll keep reading my blog.
I know that the title of my blog if “SQL Server with MrDenny” and that I don’t always post about SQL Server, but there is so much more to my job than SQL Server so I pretty much have been writing about everything that comes up.
Again, thanks for reading.
David Stein (@Made2Mentor) started a neat new post My McGuyver Moment and tagged Brent Ozar (@BrentO) who tagged me. In David and Brent’s post they talked about building something from nothing that the company ended up needing for several years.
I to have had my share of shoestring budget projects that I’ve had to deal with putting together. The one that I’m probably the most proud of was when I worked at EarthLink. We had a rather large call center, at the time it was a single office, but over time it grew to 7 offices and 3 outsourced call centers, but I’m skipping ahead here.
On the things which our call center supervisors were supposed to do was grab the call center queue stats every 4 hours and email them to management’s pagers. Needless to say this seemed like a rather big waste of time to actually have a person sum up these numbers in Excel and then send out an email.
So I built a quick VB app which would log into the PBX, gather the required numbers and send out the email. This was cool and all, and took about a day to put together. After this I decided to put the data out on a web server so that everyone else that didn’t have access to the rather expensive Lucent PBX software could see the data. So once everyone thought that was awesome, we look it to the next logical step and started exporting all the call center data and created web pages which would allow anyone to see the data.
Now these weren’t fancy dynamic pages. They were just static HTML files that refreshed themselves every few seconds and were rewritten by the app I wrote every few seconds. Now on the server side of things the PBX server that I had to connect to was a closed system meaning that my only option to connect to it was via the management tool. At the time there was no scripting support in the app, so I had to write an app which would send keystrokes to the client app and export the data to flat files, then process the files into the needed html files.
This lasted for a couple of weeks until we started adding more groups into this mix. It would end up taking several minutes to process everything. I then started loading the data into an Access database and using classic ASP to allow the users to select the data that they wanted to see. After a couple of weeks of loading data into Access every few seconds Access blew up so we upgraded to SQL Server.
After some time we added more and more call centers to the company we added more and more computers doing processing. Now when all this started we were running on old P2 400Mhz workstations all running like 128 Megs of RAM for each machine. By the time we were done with this we had a view computers all these crappy workstation class machines except for one rack mount server (see the picture of my desk below).
By the time I left EarthLink (more specifically I was tossed out kicking and screaming as all our jobs were off-shored) we had equipment in 5 offices in additional to all the machines running under my desk. There were somewhere around 4000 internal users accessing the web app and we were actually feeding hold time information to the customer facing website so that customers could see what the wait time would be in real time. By the time everything was said and done we had saved hundreds of thousands of dollars in Lucent/Avaya licensing fees as each new Lucent PBX we brought only the server and only needed the default 5-10 licenses that came with the PBX system.
Needless to say this was probably the biggest spit and chewing gum project that I’ve ever worked on. The total cost to the company was only a few thousand dollars (the rack mount server). Our SQL License was covered under our Enterprise Agreement, and all the workstations were older machines which weren’t in use any more by Tech Support (they liked it when we stole the old machines, it gave them an excuse to order new ones).
Now, as for the people that I’m going to tag I’m going to go with some people that don’t always get pulled into these.
Jessica and Goeff haven’t been going a lot of blogging recently. Hopefully this will help them start writing more often. Tom blogs like crazy, and one more just isn’t going to hurt him. I picked these three because I’m curious to see what these folks have had to band-aid together with duct-tape over their careers.
I’m not that kind of sick, any hangover would be long gone by now. But the lack of sleep has caught up with me. I’ve got exactly 5 days to get over it before my next trip, which will hopefully be much more tame.
So last week was the MVP Summit (#mvp10 for those of you on Twitter). While I can’t really go into what was covered at the sessions because it is all covered under NDA.
What I can tell you is that we had some great sessions (and some not so great sessions) and gave the product groups some great feedback (hopefully they think that the feedback was great as well).
We also partied, and did we ever party. Over the four days the highest number of parties that I heard one person attending is 12. Now I don’t care who you are, that’s a lot of events to make it to in one week, especially when you remember that we were in session from 9-6 each day, and you have to sleep at some point. Continued »
In about a week I’ve got the pleasure of attending my first SQL Saturday. Of course I’ve submitted some sessions to present at the event, because we’ll I’m a sucker for a speaking event where I can meet SQL Server Professionals, and catch up with some friends from the East Coast.
Most of my SQL Server friends on the East Coast I’m only able to see once or twice a year, MVPs I get to see at the MVP Summit and at PASS, while the non-MVPs I only get to see once a year at PASS. This gives me a chance to meet up with a ton of friends and make some new ones.
So if you are going to be in the Charlotte, NC area on March 6th, 2010 I’d highly recommend coming to the SQL Saturday event.
If you see me on Saturday you’ll probably see me with coffee or soda in my hand. If I don’t have something with me, I’ll probably be on my way to getting coffee or soda. The reason for this is that the week before I’ll be in Vegas (I know, poor me) at EMC Training for the entire week. The training class ends at 5pm on Friday, then I’ve got to hop a red-eye from Vegas to Charlotte, get a few hours of sleep (if I’m lucky) then off to the event.
There are tons laptops out there which you can buy. HP and Dell are the biggest sellers in laptops these days. However for the last couple of laptops that I’ve purchased I’ve stayed away from the major brands.
My Acer laptop was a tank. Hell my wife accidentally ran it over with a car, and the only problem the laptop had was that the screen was cracked. I was able to plug in a monitor and copy all the data off of it over the couple of days after that without issue. The monitor was replaced thanks to Best Buy’s Accidental Damage replacement policy.
But with a new year comes new toys, and it was time for an upgrade to a 64bit machine since the Acer only had a 32bit processor.
The Asus is a total new direction for me. This time I went with an ultra-light laptop. It weighs in at just 4.2 lbs and is 1″ thick and I’ve seen it run for 5 hours on a single charge. But there is a heck of a lot of performance in this little package. It comes with 4 Gigs of RAM which I’ve already upgraded to 6 Gigs, and I’ll be going to 8 Gigs later on. It has a 64bit processor, and I’ve got the 64bit version of Windows 7 installed along with 64bit Office 2010, SQL Server 2008, VPN software, all the Twitter and IM software that one could ever want. I’ve got Windows Live Sync, Carbonite, Interguard and Laptop Cop (Interguard and Laptop Cop are my companies products so they are installed for testing purposes, so I usually have beta versions of them installed) all running in the background.
Now since there isn’t actually anything wrong with the Acer and the Asus has a slower processor in it (it has a 1.4 Ghz Centrino 2), I’ve re-purposed the Acer as a portable VMware server. It has been reformatted with Ubuntu on it and has VMware Server installed on it and it is now my portable data VMware machine so that I can do demo’s, presentations, etc without slowing down my new laptop. I tried putting VMware ESX 3.5 on it but it couldn’t find the NIC, and Hyper-V 2.0 only support x64.
Depending on how things look next year, and depending on how much use the VMware laptop gets (having the two laptops did give me an excuse to buy a new laptop bag) I may buy a new laptop next year just to be a VMware laptop (this gives me a free try run at it). My hope there will be that I can put vSphere (or Hyper-V, but I’d prefer vSphere since I know that better) on it so that I don’t have to run Windows then run the VMs inside of that. Running it under vSphere would also give me the ability to start and stop the VMs from my main laptop without needing to log into the VMware machine’s console, as I already have all the vSphere tools installed thanks to my normal job. It is going to be tough getting the guys at Best Buy to let me drop in the vSphere DVD to see if installer can find the NIC and hard drive of their machines. If anyone has gotten ESX or vSphere to run on a laptop please let me know what model in the comments (or via Twitter if you don’t have an account here and don’t want to create one).
But I’ve veered off topic here.
I moved away from the big laptop companies for a couple reasons.
1. Cost: The name brand laptops are expensive.
2. Finding drivers: Like all good geeks I like to format my laptop and get rid of the crap the vendor installs since I know what I want better than they do. Getting all the drivers from HP and Dell can be a pain. Acer didn’t make it all that easy, but Asus included 2 DVDs with the laptop. The first the was the restore DVD to make the laptop all factory fresh, and the second was the driver and utility install disk which had all the drivers and extra software. I just checked the boxes for what I wanted installed and click install. It rebooted a few times during the process, but installed everything automatically.
I know that in the past the non-big brand laptops where not up to par with the big name laptops. However I can say without a doubt that these Acer and Asus laptops are up to just about any job that you give them (or at least that I’ve given them so far).
Now my last laptop was an HP, and it was kind of a crappy laptop. When I bought it it was a brand new laptop, and I ended up with one of the first ones off the assembly line. It was a 17″ desktop replacement laptop and weight in at like 100 lbs or so (probably only 10 or so, but it was still insanely heavy). The laptop used so much power that it would only run for like 30 minutes on its battery, and when the CPU was running at full speed it would actually use more power than the power supply could output and the battery would actually drain while plugged in. This was pretty much the last straw for the big name laptops for me. Especially with how happy I’ve been with the smaller companies laptops recently.
Now if Acus or Acer (or anyone else for that matter) has a laptop that they’d like me to put through the passes let me know. So far I’m only able to review the equipment that I buy, and I’m not exactly made of money.
Recently I had the opportunity to upgrade my computer desktop from 2 Gigs of RAM to 8 Gigs of RAM. This required that I reinstall Windows 7 to replace by 32bit OS with a 64bit OS.
This presented me with a problem as I work from home full time and I have to VPN in. You see we use a Cisco VPN router as our VPN server so I can’t use the standard Windows VPN client which allows you to VPN in from the login screen. You have to log into Windows before you can start the VPN client.
So I could login to my computer with a local account, VPN in, then add my computer to the domain. But when it becomes time to login with my domain account. I can try and login with my domain account but I can’t access the domain. I can login with my local account and start a VPN tunnel, but if I log out then the VPN tunnel will be closed, preventing me from logging in with my domain account.
The solution that I came up with was to login with my local account which was created when my Windows 7 was installed. I then granted my domain account admin rights to my workstation (so that I can install software). I then remote desktoped into the workstation from my laptop and logged in with my domain credentials. This triggered by local account to log out, but not before the domain account was authenticated. I then rebooted my desktop and was able to log into my newly formatted desktop using my cached domain credentials.
Windows then built my new profile just like it normally would. After the desktop appeared I was able to VPN back into the office and begin installing the needed software.