Question: Is the cloud really hardware agnostic?
The wonderful thing about cloud architectures is that they are designed to be cost effective at massive scales. The major cloud providers are profitable not only because they can aggregate customers and use the available equipment more efficiently, but they can leverage their considerable market muscle to purchase truckloads of components at steep discounts. As Google discovered and published in Failure Trends in a Large Disk Drive Population , the brand and cost of a hard drive had little to do with its reliability. Another paper delivered at the same 2007 Usenix Conference, Disk Failures in the Real World: What Does an MTTF of 1,000,000 Hours Mean to You?, came to similar conclusions. The key to building reliability in the cloud is not quality components; it is building a hardware architecture that assumes that the components will fail and plan for that failure. Since the individual components are essentially interchangeable, it stands to reason that a good cloud architecture should be completely hardware agnostic.
The fallacy of that kind of thinking is that failure rates are the only criteria for choosing a given component. As you know hardware is a moving target, new and better hardware is always coming around the next corner. Any good storage engineer knows that enterprise customers do not pay the EMC or NetApp premium just because they feel more comfortable buying from a known brand. They are typically paying for the better tools, faster performance or bigger capacity that they need for their high performance applications.
It turns out that this applies to cloud hardware architectures as well. Hardware does in fact matter if a cloud is going to run at peak efficiency. Which hardware components are chosen can make a significant difference under stress conditions. When the objective is to optimize the environment, the ideal cloud environment should be running at close to peak capacity – essentially under some stress — most to the time. For example, in a storage array, the two constraints are always going to be system network bandwidth and disk I/O, i.e. how fast the disks can push the data around. By specifying a faster disk controller and tweaking the configuration to boost the throughput by eliminating disk write caching for example, the entire system will run that much more efficiently. Yes, in this case you will be reducing disk reliability, but since you already have a mechanism that provides disk failure resiliency in other ways, that risk can be tolerated in exchange for the faster throughput.
In conclusion, at the proof of concept and small system level, cloud hardware agnosticism works just fine, but for massive cloud installations that want to run at peak efficiency, paying attention to specifying the right hardware components to eliminate the throughput bottlenecks, has the potential to boost overall performance significantly. The trick is determining if the hardware cost differential is worth the increased performance. Of course at truly Amazonian scales, that cost differential essentially disappears. However at more modest enterprise scales, in my opinion, in most cases the TCO business case for the better hardware will prevail.
About the Author
Beth Cohen, Cloud Technology Partners, Inc. Moving companies’ IT services into the cloud the right way, the first time!