If you were on twitter in the last couple of days you may have seen some links to a blog post titled “MySpace’s death spiral: insiders say it’s due to bets on Los Angeles and Microsoft” in which the author tries show that because MySpace used the Microsoft stack (ASP.NET, IIS, Windows, and SQL Server) that this was a major cause of MySpace’s demise.
As someone who used to work for MySpace’s infrastructure team (specifically I was one of the Senior level Database Administrators) that worked there, I can say that it was not the Microsoft technology stack which was MySpace’s problem. What caused MySpace’s demise was a piss poor UI (User Interface) and because of this poor UI design the slow page load times. It really didn’t matter what platform that the site was built on, crap code is crap code.
No before you assume that I’m a Microsoft Fan Boy, I’m not. You should flip through my blog and you’ll see that I’m actually quite critical of Microsoft and specifically the SQL Server product for which I’m a Microsoft MVP. I have a reputation within the Microsoft SQL Server product group of telling them the truth about their products, both the good and the bad.
The Los Angeles Area
The author also claims that they failed because there weren’t enough .NET programmers in the Los Angeles area which could work in the .NET startup mode. This is just crap. I’ve spent most of my career working at startup shops, all of which were .NET shops and there were always enough .NET programmers who could work at startup speed. What MySpace needed was a management team that understood that programming new features requires more than a few hours to do it correctly, and requires a solid scalable test environment to test in. The developers at MySpace weren’t given either one of these to work in. While the production database environment (for example) had several hundred database servers, the testing environment had only a single database server. Rolling changes wasn’t really tested before they went into production, and there was NO load testing that I saw at all. The basic excuse was that they couldn’t generate a production work load, so they didn’t load test it.
A major failure of the MySpace management was that they couldn’t retain the talent which they had. They didn’t appear to value the employees which they had. They were constantly hiring .NET developers and they were losing .NET developers just as fast if not faster than they could hire them. In the 6 or 7 months that I worked there I was the 3rd or 4th DBA that left the company. I’ve never seen DBAs come and go as frequently as they did there. During my tenure at MySpace I personally never felt like anything other than another cog in the wheel, with the exception of my last few days when they begged me to stay on board.
One of the points in the article which is valid (sort of) is that Los Angeles is a large area, so commuting to the office can suck (it would take me at least an hour to get to work, longer if I took mass transit). There is an easy way around this, allow your employees to work from home, and this problem goes away. However MySpace (more specifically the parent company Fox Interactive Media) has a strict no working from home policy. Killing this one policy would have made working there a lot easier as the employees could have lived anywhere in the area, only coming to the office when it was absolutely needed.
Were there problems at MySpace? yes.
Was the problem Microsoft? No
Was the problem being in Los Angeles? No
Was the problem the management at MySpace? Yes
Could these have been overcome? Yes
Will they be, allowing MySpace to make an amazing comeback? My guess would be probably not. At least not without a major house cleaning.
Apparently someone at Microsoft has completely lost their mind. They have signed me up as a speaker for Tech Ed NA 2011. I’ll be presenting a session titled ‘What’s New in Manageability for Microsoft SQL Server Code-Named “Denali”‘.
This is a 300 level session so it will be full of all sorts of “Denali” goodness. Exactly what I’m keeping a secret for now, you’ll just have to come to the session to find out. That or hang out with me at Dev Connections, Rally, etc. and I’m sure I’ll let it slip. 🙂
This will be my first time speaking at Tech Ed, so needless to say I’m very thrilled to be given this opportunity. This just goes to show, that if you are loud, obnoxious, rude, etc. you can still get the KICK ASS speaking gigs. My session is currently listed as TBD still, but you can find my session in the Tech Ed Catalog. As of when I’m writing this (Thursday night) it has me listed as “Denham Cherry”, but hopefully they’ll get that fixed to Denny like it should be.
See you in Atlanta in May.
If you have a modern tape backup solution you probably have some sort of dedup process in your backup environment. What these deduplication processes do normally is look at the data which is being written to the backup platform and remove duplicate values in order to save space on the tape backup system.
However if you are using backup compression, either the native backup compression or a 3rd party backup compression tool, this will pretty much completely kill the effectiveness of the deduplication application.
How the deduplication systems work
There are two different kinds of deduplication systems out there (which come as an appliance or as a software package), which do fixed length deduplication or variable length deduplication. The fixed width deduplication systems use a fixed width process to look for duplicate data. They take the file and break it down into fixed width strings (different vendors use different length string, but for the sake of argument we’ll assume a 1k fixed width string) of 1k in size. Then everywhere the string is duplicated a pointer is put in which points to the original 1k length string.
The variable length systems do basically the same thing, but they don’t use the fixed length string. Because they use variable length strings they have a better change of removing smaller duplicate values, possibly as small as every word if needed.
For example, if you have a table with employee names in it and you are using a deduplication system which has a fixed width string size of 1k, you probably won’t get any deduplication. However if you have a system which has a variable length string size, if there are duplicate names (such as within the last name column) then you will get some deduplication with the amount of deduplication simply depending on the data within the database file.
Why doesn’t this work with compression?
The reason this doesn’t work with a compressed database backup (either the fixed or variable length deduplication) is because every value will be unique. When the data is compressed every row which has the last name “Smith” in it will already have the duplicates removed. Besides that databases (at least OLTP databases) already have the bulk of the duplicate data removed via the process of normalization.
A couple of weeks ago, I took a compressed database backup (some of the data within the backup was encrypted) to Quantum’s lab here in Southern California so we could see just how well it would dedupe a compressed backup. We achieved what the guys at Quantum considered to be the impossible, we got 0% dedupe of our compressed backup file.
As the DBA, why do I care?
As the DBA we care about this, because we want to play nice with the rest of the Enterprise environment and not waste resources if we don’t need to. Because of this you will want to work with your storage and backup teams to come up with the best plan for your systems. That may mean that you are taking backups which are not compressed so that the deduplication process can deduplicate the file as depending on your data set that may be more efficient. On the SQL Server side this will make your backups take a little longer, but as long as the backups still fit within your backup window that may be ok. You’ll only be able to find this out after working with your backup and/or storage admins to find out what will work best in your specific environment.
Now I’ll be going back to Quantum in the next couple of days to try and dedup an uncompressed backup file to see how well that works compared to the traditional SQL Server compression. I’ll be sure to post an update after we run this second set of tests to get some example numbers comparing the two.
Friday morning I’ll be joining Keith Combs on the “Talk TechNet” at 9:00am Friday March 25th, 2011. You should register now (sorry about the late notice, I thought I posted about this already).
So here’s an answer to the great myth of SAN, it isn’t always going to be faster than local disk (or DAS, JBOD, etc.). The reason for this is actually pretty straight forward.
Local disk is cheap. That’s the reason in a nut shell. Let me see if I can’t explain in a little more detail.
Because local disk is so cheap, I can buy a 300 Gig SAS drive from Dell for $469, we can easily throw a lot of them at the problem, getting really, really fast storage really, really cheaply (at least compared to a SAN). Throwing 10 or 20 disks at a server is only ~$4600 or ~$9200 respectively which in the grand scheme of things isn’t all that much.
Those same 300 gig disks in an EMC array as an example will retail for ~$2500 each (~$25,000 for 10 or ~$50,000 for 20). So why would I purchase SAN storage, instead of buying a ton of local disks? The local disk is faster, and cheaper so where is the benefit?
The benefit from the SAN storage comes in a few places.
1. Centralized Administration
2. Better use of resources
3. Lower power utilization
4. Lower TCO
Lets look at each of these one at a time (and yes there is some overlap).
Instead of having to get bad disk alerts from all the servers that the company owns I get them all from one place. Instead of having to connect to each server to manage its storage configuration I have a single point that I can do this from.
Better use of resources
When using local disk I have to purchase storage for each server as a one off purchase. If a server I bought 6 months ago doesn’t need anywhere near the amount of storage that I purchased for it I’m stuck. That storage will sit there eating power doing nothing while I go out and purchase new storage for my next server.
When using a storage array, each server only has what it needs. If a server needs more storage later, that can be easily assigned. If a server has more storage than it needs you can shrink the LUN (only a couple of vendors can do this so far, the rest will catch up eventually) and that storage can be easily reallocated to another server in the data center for use. If a server needs faster storage, or is on storage which is just to fast and the faster storage could be better utilized somewhere else these changes can be made on the array, with no chance for loss of data on the array, and with no impact to the system.
Lower Power Utilization
This goes back to the better use of resources point above. When you have shelves of disks sitting around doing nothing, or next to nothing, those disks need to be powered. Power costs money which effects the bottom line. When you can re-utilize the disks the over all power costs are lower, especially when multiple servers are all sharing the spindles.
This goes back to the Power Utilization above. When you are using more power, you are generating more heat. The more heat you generate the more cooling that you need to keep everything up and running. Along with this, and tied into the better use of resources, when you need 50 Gigs of storage you use 50 Gigs of storage. When you need 1 TB of storage you use 1 TB of storage, no more no less. So while you have to purchase a bit more up front (which is always recommended so that you can get the best possible prices), when you use the storage, you’ll only need to use the storage that you actually need. If you do charge backs this will be very important.
Storage arrays also provide all sorts of extra goodies that they can do. The array it self can help with your backup and recovery process. It can help present full data sets to your Dev/QA/Test/Staging systems without using up full sets of data via the built in snapshot technologies. When migrating or upgrading from one server to another, the storage array can make this very easy.
Migrating between servers is just a matter of disconnecting the LUN(s) from the old server, and attaching them to the new server.
Upgrading SQL Server? That’s no problem. Disconnect the LUNs from the old server, and take a snapshot of the LUNs. Then attach the LUNs to the new server. You can then fire up the database engine, and in the event of a failure to upgrade the databases, just roll back the snapshot and attach the LUNs back to the original system, or take another snapshot and try attaching the databases again.
Want to keep a copy of every database that you have in the company, no matter the version of SQL Server at your DR site? Storage based replication can replicate the data for any application, it doesn’t matter if that application supports replication or not, from one array to another. Every time a new or changed block is written to the array, the array will grab that block and sent it over the wire to the remote array. This can be done in real time (synchronously) or on a delay as specified by the admin (asynchronously).
Hopefully this opened up the array a little to you, and gave you some insight into how the magic box works.
In the server world we have three different kinds of storage available. Today, only two of these can be used with your SQL Server (as long as you want to keep the SQL Server in a supportable configuration). Your three options are Direct Attached Storage (DAS), Network Attached Storage (NAS) and Storage Area Network (SAN).
Network Attached Storage is the configuration that you shouldn’t be using with your SQL Server. NAS can be used with SQL Server if you drop in a trace flag and run your server in an unsupported configuration. However if there is a problem with the SQL Server Microsoft PSS (CSS, whatever they are called this month) may not be willing to help you as officially SQL Server does not support Network Attached Storage. NAS devices are specialized devices, typically running some flavor of Linux which present network shares which Windows can recognize to the network so that people or services can access the storage over the network. NAS devices can also be running Windows sort of like a traditional file server where you can access the files over the IP network.
Direct Attached Storage also called JBOD (Just a Bunch Of Disks) or local storage is storage which is directly attached to the back of the server. There will probably be a couple of disks which sit within the server, and when these are outgrown you’ll get an external shelf of storage which has more disks in it. These will be connected via SCSI, SAS or fibre channel cable to a card within the server. For SAS or fibre channel DAS units the controller which does the RAID will probably be within the shelf which holds the disks. For older SCSI units the card which handles the RAID will probably be within the server (there are SCSI shelves which have the controller within the shelf). Direct Attached Storage is usually faster than SAN storage as the disks within the DAS units are dedicated to the servers. What you gain in speed you loose in flexibility, and manageability.
Storage Area Network storage (SAN) is very flexible storage which has a big management overhead. But that management overhead gives you lots of options, and makes it very easy to reconfigure the storage on the fly without any changes to the servers. You can easily extend the storage, reduce the storage (if you have Windows 2008 and a storage array which supports making the volumes smaller), move the volume to faster storage, or slower storage all with no outage to the server. You don’t even need to tell the server. Because there is a lot of management involved with storage arrays (the device which actually holds the disks, also called arrays among other things) correctly configuring storage arrays for maximum performance is quite difficult and usually isn’t done to maximize the performance of the array. Storage arrays don’t ship in the best possible config, then need to be tweaked and tuned to fit the workload that your specific environment will be putting on them. All to often people that are managing the storage array don’t understand what all the nobs within the array management software do, so they don’t touch them (which is probably good) or the tweak them incorrectly (which is very, very bad).
I’ve talked to people that have deployed storage array’s from EMC that have gotten less than 1/2 the performance than I’ve been able to get from similar storage arrays. What makes me think that this is a storage array configuration problem is that they were using high end 15k RPM disks, while I was using 10k RPM disks. Obviously the workloads were different, but their workload was better suited to high speed storage than mine was. They were doing sequential reads (where the blocks which are being read and right next to each other on the spindles) on a RAID 10 array while I was doing VERY random reads and writes (where the block which are being read and written are all over the array) against a RAID 5 array.
Hopefully this helps shed the light on some of the terms which you may hear flying around your office.
In a perfect work, you would use RAID 10 for everything. However we don’t work in the perfect world, we have budgets to deal with. And these budgets mean that we have to make sacrifices at times so we don’t get what we want.
RAID 10 is very expensive to implement, much more so per gig than RAID 5 or RAID 6, especially if there are a lot of disks in the RAID array. If the database doesn’t specifically need more performance than the RAID 5 array can provide, then using a RAID 10 array is just a waste of money. And that money is money that could be used for other projects that the company is trying to complete.
In the real world that we all work in (or at least most of us) performance comes at a cost, and those costs have to be controlled. If you need RAID 10, and you actually need it, and you’ve got the budget for it, then use it. Otherwise use something less expensive. If you aren’t sure if you need RAID 10 or not, start with a lower level such as RAID 5 or RAID 6, and if needed switch up to RAID 10.
Welcome to the first day of storage week of SQL University. Over the next week, we’ll be covering some different topics, at different levels, but hopefully the information is useful to everyone.
One of the big reasons that companies pay for storage arrays is that when it comes to expanding storage (LUNs) you can do it quickly and easily with no downtime. This begs the question, how do you expand your storage when you are using local disk, with little to no downtime?
So assume that you’ve got 4 72 Gig spindles in your server, and you want to upgrade these to 400 Gig spindles, you’ve got two options. The first requires a lot of downtime, the second requires basically none.
With a bit outage
If your RAID card doesn’t support online expansion (the documentation for the RAID card will tell you) then you’ll have to take a decent outage to do this. You’ll need to connect another hard drive like a large USB drive, down all the services, and copy all the data off of the disk onto the USB drive (backing everything up to tape will work as well). Then down the server, remove the old drives, put in the new drives and create a new RAID array. Boot Windows back up, create the partition (don’t forget to align the partition) copy the data back to it and you are up and running.
With a really small outage
If you have a RAID card that supports online expansion (the documentation for the RAID card will tell you) then you can do this with little to no downtime. First replace each disk one at a time, leaving enough time to rebuild the RAID array so that you don’t loose any data. Once all 4 disks have been replaced (this will take 2-3 days to get to this point) either open the RAID cards management tools, or reboot into the RAID card’s BIOS and have the RAID card expand the virtual disk (or what ever the manufacture calls it). Once that process is done bring Windows back up if you had to reboot into the BIOS, and open diskpart and use diskpart to extend the volume.
This is done by opening diskpart and typeing in the following:
select disk n
You use “list disk” to find the disk number, then use “select disk n” to select the correct disk where n is the number of the list that shows from the output of list disk. The “extend” command extends the volume to fill the disk.
Something which was talked about at the last SoCal Code Camp was running a bus from SoCal to Phoenix to help our two communities get to know each other better, and to give speakers and attendees easier access to more events.
So along those lines a bus has been setup taking people from SoCal to the next Desert Code Camp and registration is open to get a seat on the bus. Now if you plan on taking the bus don’t wait until its to late to sign up. If enough people don’t sign up for the bus, it’ll be canceled.
If you plan on attending be sure to get signed up soon, and we’ll see you in Phoenix.
Microsoft is putting on a great free three day online class for VMware professionals that need to learn more about Hyper-V.
Day 1 will focus on “Platform” (Hyper-V, virtualization architecture, high availability & clustering)
Day 2 will focus on “Management” (System Center Suite, SCVMM 2012 Beta, Opalis, Private Cloud solutions)
Day 3 will focus on “VDI” (VDI Infrastructure/architecture, v-Alliance, application delivery via VDI)
The class is being taught by two top notch presenters Microsoft Technical Evangelist Symon Perriman (a good friend of mine, who knows his stuff) and leading Hyper-V, VMware, and XEN infrastructure consultant, Corey Hynes.
Each of the days is a separate event so you need to sign up for each one separately. I’ve made each of the days a separate link to make your life easier. Just click through each link and register. The only downside to the amazing training opportunity is that it is happening during the Dev Connections conference, so if you are attending Dev Connections in April, 2011 you won’t be able to take advantage of this amazing FREE (did I mention FREE) training event. As I’ll be at Dev Connections I won’t be able to make it, which is something which I really bummed about.