So if you follow me on twitter you might have seen this tweet a little while back.
Since there aren’t many people out there that get the chance to buy and build a brand new data center from scratch, I figured that I’d go over the process with you. This is the first of who knows how many blog posts on the topic.
The first step in buying colo space and moving into it involves getting completely fed up with your current hosting company. Currently we are with a large managed hosting provider named RackSpace I probably shouldn’t name them, and have become totally fed up with them. The costs are to high and we get almost nothing from their support team but grief. They have actually unplugged a firewall’s power cable in the middle of the day by accident. We actually have to have paper signs taped to the racks with the equipment which says to not touch anything in these racks between 6am eastern and midnight eastern without manager approval (or something to that effect) because it has happened so many times.
The first step to moving into your own CoLo (this process has taken about a year at this point) is to figure out how much processing power and storage you need to purchase. This doesn’t need to be an exact figure, but a rough estimate. This will eliminate some hardware options for you.
You also need to know what features you are looking for. Here are some questions that can help you figure these things out.
- Are you going to virtualizing servers?
- A few large VM hosts?
- Lots of little VM hosts?
- Will you need storage level replication to another data center later on for DR?
- If you will be virtualizing servers, will you need to be able to setup a Windows cluster as a VM?
- How long do you need to keep backups for?
- How much data growth is expected?
- Over one year?
- Over two years?
- Over three years?
- How IO rates need to be supported?
- How much IO throughput needs to be supported?
So lets break these questions down a little bit.
Are you going to virtualizing servers?
This one is pretty much a give in. Most every company should be virtualizing at least some of their servers. If nothing else things like domain controllers, and other infrastructure servers should be virtualized. It just doesn’t pay to have physical servers sitting around using 1% of the CPU all day. Other servers like web servers and app servers are also usually a no brainer when it comes to virtualizing them. The big questions come down to your mission critical servers, SQL Server, Oracle, Exchange (yeah I know, it’s not mission critical but just wait for Exchange or mail to go down them tell me it isn’t mission critical), SAP, etc. These machines may or may not be able to be virtualized.
It’s OK to have some machines by virtual and others to be physical. In the case of this project everything is virtual except for the SQL Server cluster (to large to be a VM) the vCenter management server (cause I’m old school and want it physical), monitoring (it’ll run on the vCenter server for the most part), and some appliances which are physical appliances which have to be racked. All the web, file, and infrastructure servers will be VMs.
In our case we are going with a few larger hosts instead of a bunch of smaller hosts. As we got through the hardware review process we landed on Cisco UCS blades and servers. For the VMware hosts we are running on several of the dual socket, 8 core per socket blades with something like 96 or 128 Gigs (might be even more at this point) of RAM per blade.
For the SQL Server cluster we are also using blades as they ended up being less expensive than their physical counter parts. The SQL Server blades are quad socket, 8 core per socket blades with 256 Gigs of RAM per blade. We didn’t pick these blades for the VMware hosts because it was actually cheaper to have the dual socket blades over the quad socket blades, and nothing that will be a VM will be getting more than 4 or 6 vCPUs so having the smaller blades isn’t an issue.
Will you need storage level replication to another data center later on for DR?
If you are planning on building a DR site at some point in the future this is important to know now. It would really suck to buy a storage solution that doesn’t support this when you will need it in the future. Just because you will need it doesn’t mean you need to buy the replication software now, or setup the second DR site now. But you need to plan ahead correctly for the project to ensure that everything that you want to do with the hardware is supported. Nothing sucks more than having to go to management in the middle of the DR build and tell them that all that storage that you’ve purchased will be useless and needs to be replaced, not only at the DR site but also at the primary site. Issues like this can delay DR build out projects for months or years as you now have to pause the DR build out (probably while still paying for the DR site and equiptment), buy and install new storage, migrate to that storage, then restart the DR project and start up the replication.
In the case of this project management said that yes we will want to spin up a DR site probably within a couple of years so this limited our search for equipment to storage platforms which fully supported storage level replication. This includes having consistency groups so that sets of LUNs are kept in sync together (kind of important for databases, Exchange, etc), integration with Windows VSS provider, supporting of snapshots, etc.
Now if your storage doesn’t support replication, or you want to have a nice expensive storage array at the primary site and a much less expensive storage solution at the DR site, you can look into EMC’s Recover Point appliance. It supports replication between two storage array’s and doesn’t even require that they be the same brand of array. It isn’t a cheap solution, but if you’ve got a million dollar solution in one site and a $100k solution in another site Recovery Point might be a good fit.
If you will be virtualizing servers, will you need to be able to setup a Windows cluster as a VM?
The reason that this question needs to be asked is to ensure that the storage array supports iSCSI. The only way to build a Windows cluster as a VM is to use iSCSI to attach the VMs to the storage directly. Most every storage array supports iSCSI these days, but there are some that don’t so this is important to know.
How long do you need to keep backups for?
As much as we all hate dealing with backups, backups are extremely important. And keeping backups for a period of time will save you some headaches in the event that a backup becomes corrupt. Also there might be regulations on how long backups are kept around for. Your SOX auditor might have a requirement, as might you HIPAA auditor and your PCI auditor. You just never know what these guys might through at you.
Then there’s the question of off site backups. Having backups is great, but you need to get those backups off site in case something happens to the building that the backup system is in. You’ve got a couple of different options here.
- Go old school and have iron mountain or someone pull the tapes and store them somewhere.
- Get a virtual tape library (VTL) and backup to that. Then get a second VTL and put it in an office or another CoLo and replicate between the two.
- Put your backups on a LUN and replicate that LUN to another facility
- Out source the backups to the CoLo
Option 1 is the way that it’s always been done. It’s reliable, slow and can be pretty costly. Option 2 is a pretty new concept, probably just a few years old now. It can work, if your backups are small enough and if you’ve got enough bandwidth. Storing a monthly worth of backups can take a LOT of space. Option 3 probably isn’t the greatest unless the only backups to worry about are the SQL Server backups as SQL can handle the purging of backups it self. Option 4 is worth looking at. Depending on the amount of space needed and what your CoLo charges it might be worth it to have the CoLo handle this for you.
In the case of this project we went with a combination of options #1 and #2. We have a VLT to backup to so that the backups run very fast (a VTL is basically just a separate storage array that is only used by the tape backup software and includes compression and deduplication to reduce the size of the backups). So we will backup to the VTL then copy the backups to tape. Then iron mountain will take the tapes off site for us. The VTL will hold about 2 weeks worth of backups on site, which we’ll have a second copy of on tape. Once we have the DR site we’ll get another VTL and replicate that, probably increasing it’s storage to 4-6 weeks and dump the need for the tape and offsite backups as everything will be backed up in two different CoLo’s in two different cities.
How much data growth is expected?
Knowing how much space you need today is important. Knowing how much space you need in 3 years is more important. Just because a storage array supports your data size today doesn’t mean that it will support it in 3 years. We use three years for a couple of reasons. First that’s typically how long the maintenance contract on the hardware is. Second that’s typically how long the financing term is for these kinds of purchases.
If you have 20 TB of space needed today, but in 3 years you’ll need 80 TB of space that’ll drastically change the kind of equipment that you can purchase.
How IO rates need to be supported?
How much IO throughput needs to be supported?
The next two questions go right along with the prior one. How much IO needs to be supported and high much throughput needs to be supported. These numbers will tell you if you need an array which supports flash drives, and how many drives need to be supported. Without these metrics you are totally shooting in the dark about what you actually need.
Once you’ve gotten all these questions answered you’d think that it’s time to start looking at hardware, and you’d be wrong. It’s time to go to management and get this thing approved to move forward. Join me next time as we look at that process.
P.S. This series will be at least half a dozen posts long. I’ll be tagging all of them with the tag “Building a new CoLo” to make it easier to follow just these posts via RSS if you aren’t interested in the rest of my stuff.
My SQL 2012 class is just a few short weeks away, but there are still seats available for the class. Take the time and get signed up now for this great four day class where we will be diving into SQL Server 2012 with loads of hands on labs to really get you ready to deploy SQL Server 2012 as soon as it is released.
As this class is all about the hands on part of the class, we won’t just be going through four days of lecture but we will instead be doing a great combination of lecture and lab so that you know not just the theory from slides but will have actual hands on experience of using the product to work through real life like scenarios.
Hopefully you’ll get signed up today and I’ll see you at the class next month.
So while installing a new Cisco UCS system with the one of the newer builds of the firmware which was 2.0(1s) we were getting an error with both chassis which said they were in an unsupported-connectivity state. The config that we setup was pretty basic and straight forward. We have two UCS Fabric Interconnects, with two chassis, and two cables from each blade in each chassis going to the Fabric Interconnects. There’s a diagram over there of what the system looks like for one of the two chassis (I love the fact that the management app makes nice diagrams like this).
When you configure a Cisco UCS system you tell the system how many cables the Fabric Interconnects should use to discover each chassis. At least that is how the screen is worded. What the setting actually means is how many cables should the Fabric Interconnects expect to see between each Interconnect and each blade in the chassis. We configured the setting for “Platform Max” in case we decided to add more cables later even though we only had two cables now, and only planned on having 2 cables (as shown in the diagram) for the time being. When we were set this way we had this strange unsupported-connectivity state error showing up for each of the chassis.
To fix this problem we had to change the discovery policy from “Platform Max” to 2 as shown in the below screenshot.
To make this change in the UCS Manager and select the equipment tab in the left menu. On the right select Policies tab. In the new lower tab menu select “Global Policies”. From there you can change the Chassis Discovery Policy which you can see in the screenshot below.
Hopefully this helps you if you run into this problem.
SQL Server is a damn good product, but it sure isn’t perfect. Like any good product out there people have come up with things that can be bolted onto the core to make SQL Server even better. Without these bolt on parts SQL Server looks a little dull. But these bolt on parts may not make the engine run better, but they make it look a lot better and that makes us want to make the SQL engine run better.
Some of my favorites (in no particular order) include:
The SSMS Tool Pack is a great add on for SQL Server Management Studio. It’ll save you if SSMS crashes by auto saving all those unsaved SQL Scripts for you. It’s got a great feature to help you read execution plans, a way to easily run a script against multiple databases, various templates, and much more.
Michelle (aka SQL Fool) has written a great Index rebuild and defrag stored procedure that anyone who is walking into a shop which isn’t going maintenance can take and throw onto the servers and happily know that the SQL Server will have some good maintenance being done automatically. The script will do rebuilds online when possible, offline when it must and figures out the order that things should be done in.
Adam was written sp_whoisactive and this is probably the gold standard is looking at what is causing SQL statements to wait, getting their execution plans, and a lot more. I’m pretty sure that there is a switch in there somewhere that will tell sp_whoisactive to make me breakfast. Adam has included loads of ways to filter the output so you can quickly and easily filter out all the spids that you don’t care about and get into the ones that you want. You can even control the formatting of the output in a variety of ways so that it fits your needs.
I’m cheating a little on this one, as I’m the one that wrote sp_who3, but it’s my list and I’m allowed to do that. sp_who3 will normally show the same output as it’s mild mannered cousin sp_who2. But when you call sp_who3 and pass it a spid that you are looking for a massive dump of information about that spid is returned. This dump includes the current statement which is being processed, the entire batch which is being processed, all the information formatted like the old sysprocesses table about all the threads for the SPID (very useful when seeing CXPACKET waits) and a ton of locking information. While the output isn’t very pretty, its functional. Personally I use sp_who3 to dig into parallel queries after I’ve done the initial identification of the problem using sp_whoisactive. (While the site only says SQL 2005 as the newest version that version works just fine on anything newer than SQL 2005.
Now go download and install these bolt-ons to your SQL Servers. I’ll wait…
Now that you’ve got all these bolt on parts installed, can’t you see how much nicer it is to work on the SQL engine. It’s easier to get at the information that you need. It’s easier to keep the system up and running. And you want to work on the system more now that it’s prettier and easier to work on. Much like my motorcycle is much prettier now that it has all those shiny parts bolted onto it.
So I think it’s been kept pretty quiet so far, as I’ve been insanely busy the last few weeks, but I am so thrilled to say that I’ve been picked to present a pre-con at the SQL PASS Rally in Dallas, Texas either on May 8th or May 9th (I haven’t been told which day yet). Either way I can’t wait to come down and talk about storage and virtualization with you, yes you, for the whole day. If that sounds like something you’d be interested in then get signed up (as soon as registration opens) and we’ll have some fun learning and making fun of your favorite storage admin.
If you are looking for some great SQL Server 2012 training so that you are up to speed on the new version I’ve got some good news for you. There are still seats available in my SQL Server 2012 class in Los Angeles March 19th-22nd, 2012. During this class we will cover a variety of subjects including installing/upgrading, deployment automation, AlwaysOn, the changes to T/SQL, planning a migration, the new enhancements to the BI presentation layer, as well as partitioning and data security.
Now don’t worry, this class isn’t just lecturer for four days. It’ll be a very interactive class with labs to do through out the week to ensure that you’ve got a solid understanding of the concepts that we are talking about. By the time you are done with this four day class you should be ready and able to begin upgrading SQL Servers to SQL Server 2012 successfully as well as be able to bring back the other portions of the class to share with the other departments within your company.
If you are headed to Microsoft’s Tech Ed North America or Tech Ed Europe you should really look at signing up for a SQL Server Pre-Con. The pre-cons are full day sessions focusing on a specific topic, led by an industry expert. In the case of the SQL Server pre-con “Microsoft SQL Server Performance Tuning and Optimization” you’ve got an all day session with SQL Server MVPs Denny Cherry (@mrdenny | site) and Thomas LaRock (@sqlrockstar | blog).
If you haven’t signed up for Tech Ed yet it appears that there are even some early bird discounts which are available for people that sign up before the end of February for Tech Ed North America and before the end of March for Tech Ed Europe. This makes it an even better deal because who doesn’t like saving money.
Hopefully you’ll join us in Orlando for Tech Ed North America or in Amsterdam for Tech Ed Europe for a great day of SQL Server training.
The other day I had a VERY unpleasant situation happen. I managed to dump a basically full 64oz (1.8 liter) cup of iced tea all over my desk. In theory this shouldn’t be a problem but it landed right on top of one of my laptops. Thankfully the laptop was off but it was plugged in when this happened. Now I’ve dumped water on computer components before, and I’ve had a pretty good track record with getting things dried out pretty easily, so I figured that I’d share my little trick that I use with everyone. It’s actually quite an easy process, and works with more than just electronics. In fact I’ve even used it to dry out the inside of our car. The method of choice?
Yep, that’s right good old fashioned white dinner rice, uncooked of course. The reason that rice works is because rice makes an amazing desiccant. For those who missed the episode of Numbers where they talked about desiccant’s, a desiccant is something that draws the moisture out of something else. In the cast of the TV show it was drawing the water out of a dead body, in this case out of my dead laptop (they didn’t use rice in the show). You know those little packets that come in just about everything that you buy, that say “Don’t eat” on them? That’s a desiccant.
In the case of the laptop, I quickly unplugged it and pulled out the battery. Then got a box a bit bigger than the laptop, and dumped a bunch of rice into the box, dropped in the laptop, which I then covered with more rice. Then the battery and an HP TouchPad that got drenched as well. Cover everything with a healthy layer of rice and let sit for several days. The nice thing about this technique is that there’s no head, no chemicals, no cloth or paper towels to touch anything in the laptop. Just the rice absorbing up any water in the laptop.
When doing this with a car, you can’t put the car in a box and fill it, so a slightly different approach is needed. In my case we had a car door opened just a crack which let in rain water (yes it really does rain in Southern California). Using a vacuum and towels we could only get so much water out, but we didn’t want to worry about the floor of the car rusting out. So we pored about 10 pounds of white rice into the floor board of the car a waited about 6-8 hours for the rice to absorb all the water. When we vacuumed out the rice everything underneath was nice and dry.
Back to the laptop. I left the laptop in the rice for about a week in this case, and thankfully the laptop came back in perfect working order.
To pull this off I used about 15-20 pounds of rice total. When I dumped the cup Kris went to Costco and picked up a 50 pound bag of rice and I used a large bowl to get it into the box.
Hopefully you never have to use this trick, but if you do there it is.
In case you missed it Twitter released v1 of TweetDeck, and it sucks.
So if you need to rebuild your computer and you want to get a version of TweetDeck that doesn’t suck you can download the old version from codelog.org and keep the usable version instead of the garbage which is TweetDeck 1.0. Hopefully Twitter will make TweetDeck 2.0 actually usable. Until then we have 0.38.2.
Here are my slide decks from SoCal Code Camp last weekend.
I had a great time as always, and I look forward to seeing everyone at the next event.