So a few years ago a new Storage concept was introduced to market. That platform is know known as the IBM XIV platform. What makes this system so different from every other storage platform on the market is that the system doesn’t have any hardware level RAID like a traditional storage array does. What the system does is assigns 1 Meg chucks of space in pairs on disk throughout the system so that there is always a redundant copy of the data.
The hardware that makes up the system is relatively standard hardware. Each shelf of storage is actually a 3U server with up to two CPUs (depending on if the shelf is a data module or an interface module, I’ll explain the differences in a little bit) and 8 Gigs of memory for use as both system memory and read/write cache. Because of this architecture as you add more shelves you also add more CPUs and more cache to the system.
As I mentioned above there are two kinds of shelves, interface modules and data modules. There are effectively the same equipment with some slight additions for the interface modules. Each module is a single chip quad core server with 8 Gigs of RAM and 4 one Gig Ethernet ports for back-end connectivity (I have it on good authority that this will be increasing to a faster back end in the future). Each shelf contains 12 1TB or 2TB SATA hard drives spinning at 7200 RPM. The Interface modules have a second quad chip CPU, 4 four gig fibre channel ports, and 2 1 Gig iSCSI ports.
Now the system comes with a minimum of 6 shelves which gives you 2 interface modules, and 4 data modules. From there you can upgrade to a 9 shelf system which gives you 4 interface modules and 5 data modules. After you have the 9 shelf system you can upgrade to anywhere from 10 to 15 shelves with new interface modules being added 11 and 13 shelves. There’s a nice chart in this IBM PDF (down on page 2) which shows how many fibre and iSCSI ports you get with each configuration.
All these modules are tied together through two redundant 1 Gig network switches which use iSCSI to talk back and forth between the shelves. My contacts at IBM tell me that the haven’t ever had a customer max out the iSCSI back plane, but personally I see the potential for a bottleneck. Because of how distributed the system is, I can see this, but if things don’t balance across the interface modules just right I can see a bottleneck potential here (my contacts tell me that the next hardware version of the product should have a faster back plane so this is something they are addressing). There’s a nice picture on this IBM PDF which shows how the modules talk to each other, and how the servers talk to the storage modules.
The really nice thing about this system is that as you grow the system you add processing power and cache to the system as well as fibre and iSCSI ports so that should really help eliminate any bottlenecks. The downside that I see here is the cost to get into the system is probably a little higher than some of the competitor products as you can’t get a system with less than 6 shelves.
How it works
From what I’ve seen this whole thing is pretty cool when you start throwing data at it. When you create a LUN and assign it to a host the system doesn’t really do a whole lot. As the write requests start coming in it starts writing two copies of the data at all times to the disks in the array. Now as the data is written one copy of the data is written to an interface module, and one copy of the data is written to a data module. This way even if an entire interface module were to fail, there would be no loss of data. That’s right, looking back to the hardware config, we can loose all 12 disks in a shelf and not loose any data, because that data is duplicated to a data module. So if we had the largest 15 shelf system with all 6 interface modules, we could loose 5 of those 6 interface modules and not loose any data on the system. Now if we had a heavily loaded system we might start to see performance problems as we start to max out the fiber on the front end ports, or the 1 Gig back end interconnect ports, until those interface modules are replaced but that’s probably an acceptable problem to have as long as the data is intact.
Because there is no RAID there’s no parity overhead to deal with which keeps everything nice and fast. Because the disks aren’t paired up in a 1 to 1 like they would be in a series of RAID 1 arrays if a disk fails the rebuild time is much quicker because the data is coming from lots of different source disks, so the odds of a performance problem during that rebuild operation is next to nothing.
The system is able to keep everything running very quickly because every LUN is evenly distributed across every disk. When you create a LUN within the management tool it correctly sizes the LUN for you for maximum performance. While this will cost you a few gigs of space here or there the performance benefits are going to greatly out weight the lost storage space; especially when you remember that these are SATA disks, so the cost per Gig is already very low.
My Thoughts on the System
Now I’ve only had a couple of hours to work with one of these units. I’d really like to get access to one for a week or two to really pound on the system and really beat the crap out of the system to see what I can really make the system do (hint, hint IBM).
The potential IO that this system can serve up to a host server, such as a SQL Server, is massive. Now once you load up a few high IO servers against it the system should be able to handle the load pretty well. The odds are getting a physical hot spot on one disk is pretty low since the LUNs aren’t laid out in the same manor on each disk (in other words the first meg of each LUN isn’t on disk 1, the second meg of each LUN isn’t on disk 2, etc).
The management tools for the XIV system are pretty cool. They make data replication between two arrays very easy. It’s just a quick wizard and the data is moving between the systems. One thing which is very cool with the management tools where this system is a step above other arrays is that the XIV is aware of how much space has been used in each LUN. This makes disk management much easier as companies where the storage admins don’t have server access, and the server admins don’t have access to the storage array can each monitor free space from their respective sides which gives a better chance of someone seeing full disks quicker and being able to do something about it quicker before it becomes a problem.
Like every system with RAID if you loose the right two disks you’ll have some data loose. If you have a standard RAID 5 RAID array if you loose any 2 disks in the array then you loose all the data on the array. If you have a RAID 10 array if you loose a matching pair of disks then you loose everything on the array. With the XIV system if you loose two disks you’ll probably be ok, as the odds that you loose two disks that have the same 1 Meg block of data on it are very slim, but if you did loose those two disks before the system was able to rebuild you could loose the data on that LUN, or at least some of the data on the LUN. Now IBM’s docs say that the system rebuilds from a failed disk to a hot spare within 40 minutes or less (page 4), but I’d want to see this happening under a massive load before I would put my stamp on this.
Overall I would say that the XIV platform looks pretty stable. With what I’ve heard about the next generation of the hardware it appears that most if not all of the issues that I see with the platform appear to be resolved. The one thing which I’d really like to see would be three copies of each block of data through out the system; as the odds of loosing three disks all containing the same 1 meg block of data would be next to 0. Maybe this will be a configuration option with the 2TB disks, or maybe when the 3TB disks come up (when ever that happens). But then again, I’m a DBA so I love multiple copies of everything.
Now I’m sure that some of the other storage vendors have some opinions about the XIV platform, so bring it on folks.
He’s got a great idea for a project, so now it’s on us to make sure that he builds it. Here’s what he submitted as his idea.
Talk about a step up in difficulty from finding out your email address! haha
I would leverage an MSDN ultimate license to attempt to build a kick ass SQL Server DBA repository. I’m not talking about a single table holding a list of all the SQL Server instances you manage either. The ultimate goal would be an automated process that gathers information about all of the instances in the environment daily. This information can be viewed on demand from reporting services or web pages. There would be configurable alerting rules that would email the DBA distribution list. An example would be for databases without a backup in x amount of days. The features would be selectable so you can get a basic amount of information without any changes on the production instances.
There are similar programs / scripts that get you comparable information but I have found most of them either only gets some of the information DBA’s need or require configuration on each instance. I’m hoping to setup something that provides all the information with minimal footprint. This would be a great tool in troubleshooting issues as you can easily identify any login changes, sudden database file growth, or schema changes regardless of the SQL Server instance version.
Darn forgot to add that my app and source code would be available freely to the SQL Server community. Hopefully people would contribute and make the application even more useful for everyone!
Later this week I’ll announce what I’m doing with the third license.
The first MSDN giveaway that I did today was just to easy. Time for something that takes a little more work.
Put together a quick blog post (or feel free to put it in a comment here) about what really kick ass software you are thinking you could develop with this free MSDN license. (If you do a blog post be sure to do a ping back to this post so I can find your post.)
I’m not going to hold you to it, but hopefully you’ll actually make the software.
This is for a full blown MSDN Ultimate license. It comes with everything that a paid for MSDN license comes with except: No MSDN Magazine, no support calls, no free Office 2010 license. You get the rest of the Microsoft software suite for development and testing.
I’ll take the people who respond and put the names into a hat and pick one at random.
All comments and blog posts need to be posted by 6pm Pacific time today (winner will be announced shortly after that on my blog and twitter). Be sure that I know how to get a hold of you, or that your contact info is in your about page on your blog or something.
PS. If you’ve already won a license from me, no you can’t win a second one.
A while back I was asked to pick up a chapter in a Windows 7 book. It took a while to get my copy of the book, but it has finely shown up. The book is titled “Microsoft Windows 7 Administrator’s Reference: Upgrading, Deploying, Managing and Securing Windows 7“. So far the book has 2 reviews on Amazon, and they are both very positive.
Hopefully some more reviews will be posted.
I’ve just had Amazon add the book to my Author page as well.
This session will be a two part session in which we will be focusing on two of the biggest topics in the DBA field, how to properly design your storage and virtualization solutions. Storage can be one of the biggest bottlenecks when it comes to database performance. It’s also one of the hardest places to troubleshoot performance issues because storage engineers and database administrators often do not get along. We’ll be digging into LUNs, HBAs, the fabric, as well as the storage itself. In the second half of the day we’ll be looking into the pros and cons of moving SQL Servers into a virtual environment. Specifically we’ll be looking into when it’s a good idea and when it’s probably not a good idea. Like everything in the database world there are no hard set answers as to if virtualization is a good idea or not. We’ll look into how tie the virtual platforms to the storage array so that you can maximize the storage performance for your SQL Servers and the virtual environment.
In order to register for my pre-con (or any of the fantastic pre and post cons sessions) simply register for the PASS Summit and on the third page or so you’ll be given a list of the available Pre-Conference and Post-Conference sessions.
Hopefully you’ll join me on Monday November 8th, 2010 for 7 awesome hours of “Storage and Virtualization For The DBA”.
I’ve just posted the slide decks for my sessions from this weeks SoCal Code Camp. I’d like to thank everything that gave me great feedback on how to improve the sessions.
For those of you at the storage sessions, watch this blog for my announcement about the longer storage presentation that I’ll be doing up in Irvine.
So apparently I need to actually read ALL the emails from PASS instead of letting my ADD kick in. I’ve been selected for a Pre-Con on Monday November 8th, 2010. You see PASS sends you a few emails when you are selected. The first tells you which pre-con and spotlight sessions have been accepted. The second has the speaker contract, and apparently tells you when your pre-con will be.
I read the first, saw the second and simply opened the attachment. That’ll teach me.
I’ll hopefully be seeing everyone bright and early on Monday the 8th for my Pre-Con session.
In case you aren’t on twitter, at about 5pm (Pacific time) yesterday PASS sent out the emails to the people who have had their pre/post con sessions selected, and apparently Tim Ford (Blog | Twitter) was drinking heavily because I got my pre/post con approved. Allen White (Blog | Twitter) also had one approved. Over the next day or so others people should pop-up saying that they were approved as well.
Now in case you haven’t heard of a pre/post con I’ll give you the skinny.
It’s a 7 hour presentation, that people are paying to attend.
As I’ve had mine selected, for the low, low price of $395 you can come and catch a deep dive (or as deep as we can go in 7 hours) or storage and virtualization in my “Storage and Virtualization for the DBA” session. Now I don’t know yet if this will be a pre or a post con as they haven’t announced the schedule.
I’ve also got a spotlight session picked up. The spotlight sessions are 15 minutes longer than the regular sessions (so 90 minutes) which will give us lost of time to talk about the what and how of SQL Service Broker in my “Getting SQL Service Broker up and Running” session.
I hope to see you at both sessions, but if not I better see you at PASS.
As soon as PASS has the marketing materials and the abstracts up, I’ll be sure to point a link to them.
This time around I’ve for four sessions. Three that I’ve put together, and one that is a group pannel.
“Storage for the DBA” is a session I’ve given a few times at SoCal Code Camp where we go into the basics of storage, and how it relates to SQL Server.
“There’s more to know about storage?” is a followup to the “Storage for the DBA” session where we will go into the design techniques that the various storage vendors have used to create their respective platforms. From there we’ll move into some of the more advanced features you can use with storage arrays that turn the from the large JBOD that most people think of them into advanced storage devices worth every penny that they cost.
In the DACPAC session we’ll look at this new feature called Data Tier Applications to see how they work, when they should be used, and when they shouldn’t be used.
The panel discussion will have a large number of SQL Server professionals from the Southern California area including myself, Andrew Karcher (Blog | Twitter), Lynn Langit (Blog | Twitter), Bret Stateham (Blog | Twitter), and several more all there to answer your questions about SQL Server.
I hope to see you there at Code Camp.
It occurred to me that I haven’t ever posted the slide decks for the Portland SQL Saturday. Sorry about that.