Last week I wrote an article on IT environments that chose Network File System (NFS) for their shared VMware storage, and at least one large IT shop corroborates my story. An IT administrator at a well-known investment management firm writes that he runs 45 VMware ESX 3.0.2 hosts that run more than 1,000 VMs entirely on Network Appliance network-attached storage (NAS) 3070 boxes — and with great success.
“We haven’t seen any issue with speed, that is for sure,” he writes.
Before switching to NetApp, the firm ran its environment on EMC and Hitachi storage area networks (SANs). The admin described the latter as “a pain,” “expensive,” and suffering from SCSI lock, manageability and host bus adapter (HBA) issues.
By moving to NetApp NAS, the firm has also realized another benefit: improved data protection. “We also love the fact that we save a lot of money on the backup solution. We just use snaps to another NetApp — no agents, no tapes, no overpaid workers, no maintenance contracts on over 1,000 servers.”
Bless its heart, NetApp also chimed in on the article, taking umbrage at statements made by Fairway Consulting Group CEO James Price. About a year ago, NetApp began testing NFS for VMware at the behest of customers looking for more manageable storage but who were worried about the ability of NAS and NFS to scale. What they found is that “NFS is robust enough to run production environments,” said Vaughn Stewart, virtualization evangelist with the company. In the coming months, NetApp plans to publish results of tests performed in conjunction with VMware.
At the same time, NetApp is working with VMware to get the company on board with NetApp’s “NFS is good” message. As it stands, “VMware is inconsistent throughout its documentation about the role of NFS,” said Phil Brotherton, NetApp senior director of enterprise solutions. It may be a tall order, as the storage community has a strong bias in favor of Fibre Channel SANs.
But in Brotherton’s view, some of that preference is a bit self-serving. “A lot of people are trained on a technology, and that’s a good reason to be biased toward it,” he said, adding that many shops have sunk a lot of money into existing Fibre Channel infrastructure. “But I also see a lot of people try to spin technical arguments to justify what is really a sunken cost argument. . . . I would love to see the discussion move past performance to why people are really using NFS; performance is not the issue.”
When it comes to displacing SAN with NFS, our nameless IT administrator echoed Brotherton’s opinion. “I can tell you that you there are some old-school SAN guys (myself included) that are scared that they might not be needed as much as they think. It is becoming easier and easier to use NFS for most everything. There are certain cases where a SAN is needed, but it is not necessary for every case.”
A recent discussion on the OpenBSD mailing list led to the assertion that virtualization decreases security. For those interested, a summary of the discussion is available on Kernel Trap. But proponents on both sides of the argument have taken to throwing about emotionally driven comments rather than thinking objectively about the subject. Of course, because the original comment labeled all those as”stupid” and “deluded” who think virtualization somehow contributes to security weaknesses, who can really blame people for getting a bit emotional? All the flame-war commentary aside, the question remains, does virtualization weaken security? The original argument that virtualization can diminish security was based on two points:
- If software engineers cannot create an OS or application without bugs, what hope does a virtualization solution have to be bug-free?
- x86 hardware is ill-suited for virtualization.
The first point does two things: it lumps all software engineers, operating systems, and applications into one pool and assumes that it is possible to find bug-free code.
Addressing the sub-points in order, while it is true that software engineers are human (and we make mistakes) and that software in general has a track record of imperfections, it is also true that the world does not judge all software engineers or software to be the same. In fact, I would guess that a lot of members of the OpenBSD mailing list in fact prefer OpenBSD to, Windows, let’s say. However, there are many readers of this blog that may prefer Windows to OpenBSD, or Linux, or OS X, etc. The same preference could be applied to office suites (OpenOffice, StarOffice, MS Office, KOffice, etc.). The fact of the matter is that we all have our own preferences: we do not judge software to be the same.
Secondly, the first point argues that the community should expect a bug-free hypervisor, and anything less contributes to the decrease of the overall security of a server platform. This is a very lofty expectation indeed! A very long time ago I wrote to Slashdot, heck, almost seven years ago now, and asked the question why it was not possible for developers to spend more time on projects and produce bug-free software. Commander Taco (the ring-leader of Slashdot) himself replied to me and said that it was a foolish expectation: software is 1) written by humans and 2) is far too complex today to be without errors. However, people still judge some operating systems to be more secure than others. The same for kernels. How can such a judgment possibly be made if all software has bugs? The answer is “easily.” We observe the rapidity that bugs are discovered in software, the impact that they have on the IT infrastructure across the world, the speed at which operating system and independent software vendors (OSVs and ISVs) release patches, and how easily those patches are applied without affecting the rest of the server platform, and then we judge the security of a piece of software. Therefore we do not judge a piece of software to be secure because it contains no bugs, but rather by the history of its imperfections and how quickly blemishes are removed.
Notice that I did not say whether or not I agreed with Mr. Slashdot. I do not. I do believe that software designed be generally purposed, such as today’s OSes, is doomed to be bug-laden, simply because it lacks a specific purpose and too many conditions have to be accounted for. However, imagine if the same leeway were given to the software that runs our air-traffic control systems? Or military installations? Such software is held to a higher standard, and it can be in part because it is designed with a specific purpose. The same is true of hypervisors: they are designed specifically for one purpose. They do not yet have all the cruft and bloat sitting on top of them that today’s OSes do. Here’s hoping that the ISVs producing today’s leading virtualization solutions step up to the challenge.
In short, I believe that there is a reasonable expectation that hypervisors will be a lot more secure than general purpose applications written on top of general purpose OSes could ever hope to be.
Yes, the x86 instruction set was never designed to be virtualized, but to say that the instruction set has not grown well above and beyond its original intentions is to do an injustice to the original minds at Intel who took part in creating one of the most persevering pieces of technology to date. With the first set of virtualization extensions, those created to solve the problems of ring compression and ring aliasing, the x86 instruction set was given a breath of new life. And with the latest extensions, enabling live migrations of virtual machines across multiple generations of processor versions, the staying power of the x86 instruction set in a world with virtualization has been increased even further.
My point is simply that just because the x86 instruction set was not designed with virtualization in mind does not mean that it cannot work, and work securely. That is the beauty of x86: it can be extended to do what we need it to do. History has spoken.
With both of the original arguments shown to be false, is the conclusion of the original argument then reversed? Not completely. While virtualization does not decrease security, the potential for it to do so is there. Hypervisors are software, and although they are a lot less likely to have bugs than a general-purpose piece of software, bugs can still occur. However, to blanketly state that virtualization decreases security is far too general as there are many different implementations of virtualization. For example, if VMware ESX was found to have a memory sharing bug that allowed one virtual machine to read and write the memory of another, does this mean that XenSource is immediately compromised? Of course not. So even when a bug in a hypervisor is found it does not immediately mean that all virtualization is suddenly subject to the same problem.
As I stated earlier, the security of any software package is judged by the number of bugs historically observed, their impact (potential or real), and how quickly the parties responsible for said software fix the bug. While the potential is there, it is far too soon to observe whether or not virtualization decreases security. Only time will tell.
What do you do when you are giving a session on virtualization migrations, and no one shows up? You sit down and blog about how you are giving a session on virtualization migrations and no one showed up! In all fairness, TechTarget is giving away a Mercedes-Benz right now, so I think people are eagerly awaiting that drawing, and my session is right after lunch, AND it *is* a repeat of one I did earlier today. Yes, those are the things that I will tell myself tonight as I curl into a fetal position and sob in a corner 🙁
DCD 2007 has been a natural extension of last year’s conference. 2006 saw virtualization come into its own in the enterprise, and 2007 has seen virtualization mature into a ready-to-use, ready-to-integrate solution for many of today’s data center related problems. One of those problems has been disaster recovery, business continuity, and business resumption. This year’s DCD conference has had no shortage of innovative and instructional sessions on how to create cost-effective BC solutions using virtualization. Two technologies that are paving the way for these SMB to enterprise BC solutions are iSCSI and 10 GbE. Virtualization requires shared storage to enable any of the features used in DR and BC scenarios (LiveMigration, Resource Scheduling, Power Management), but for the longest time shared storage has meant fibre channel SANs, an expensive proposition for many. The continued success of iSCSI and the eventual commoditization of 10 GbE will enable enterprise-class shared storage at a fraction of the cost for SMBs and enterprise data centers alike. Inexpensive, enterprise-class shared storage will result in the availability of virtualization’s high end features that contribute to DR and BC solutions for all organizations — small, large, and everything in between.
Once again, DCD 2007 was a natural extension of 2006 — the data center has continued to evolve around virtualization. What will next year bring? My best guess? To steal from a colleague, we are going to start hearing about the highly-utilized data center, the dynamic data center, the data center that can be managed and monitored from a single console. And who is to say that the data center management of the future will require any human interaction at all?
Return on Investment… the holy grail of IT.
Simply put, ROI is defined as the “ratio of money gained or lost on an investment relative to the amount of money invested”. One formula used to determine ROI is “net income plus interest divided by the book value of assets equals Return On Investment.”
In real terms, when you invest in a technology for your business, it’s about more than that. IT-related ROI often needs to provide cost savings, rather than generate revenue. In the case of virtualization for consolidation, this is often a simple calculation made difficult by many variables.
Variable 1) Power
This is a hot topic in the virtualization world, and has been ever since energy costs spiked and data center electric bills started going through the roof. Tracking the ROI of energy savings requires discipline, but in a large environment the numbers can be significant. It’s important to get a good baseline before implementing server consolidation via virtualization, which means getting the bills from the previous few years and calculating average monthly and yearly energy costs. Then, after the project is complete the process must be repeated and the results compared. Lastly, as the elapsed time periods pre- and post-project are matched up, the calculation must be re-run.
As an example, if you have an average cost of $50,000 per year for power over five years pre-consolidation, you need to calculate each year out and then each month, so that in the first month post-project you can compare that same month the year before, and then in six months the cost of the same six months the year before, etc. etc. This shows how fluid ROI can be over time, but how important it can be to be disciplined in tracking numbers like that to show success and failure rates over the long-haul, and not just the last quarter. Whether to include market fluctuations in power costs into your calculations or not is one I’ll leave to the reader. I personally don’t, because there’s one thing I can count on: Costs go up. If your bill is paid by the company, and includes other sources such as cube farms and the cafeteria, the calculations can still be made, but good luck removing the non-IT variables if needed (like say, the cafeteria closing for a month for renovations… that will cut power use dramatically).
Variable 2) Long-Term Staffing and Consulting
The hardest calculation of them all. How much did it cost for you to pay those consultants? How much time did your staff invest in the project, and how much is that time worth as an overall portion of their salary and benefits? Do you even calculate benefits as a factor in ROI? How much was spent on training and other job-related benefits during the time? Did a server fall on somebody’s foot and cause a Worker’s Compensation claim? How much time are staff members going to spend on administration? How does this impact other processes, and what’s the cost to them? The short answer is that you will spend more on virtualization experts, but less on hardware technicians, because there will be less hardware to break. This teeter-totter of staffing will carry over into several types of team – including networking, storage, etc. Tally the fully-burdened costs and compare them to pre- and post-project figures. Nobody likes to think about laying people off because they aren’t necessary anymore, but retasking is good for the soul, and often for the career of the retasked. That means you need to calculate the training costs outside of virtualization as well.
Variable 3) Infrastructure Hardware and Software
The easiest calculation of them them all. How much did it cost you to acquire all of your assets over how long a period of time? What is the average cost per year for an average growth rate? How much can you then expect to spend over an equivalent period of time in the future using that average, versus how much you project to spend using virtualization-based server consolidation. If you use chargebacks, what do you charge and how can that be reduced? If you reduce chargeback costs, should you be factoring in their lower costs to your ROI calculation on a seperate line item?
Variable 4) Services Reduction
That’s right – less services. Less management of services too. Backup and DR comes to mind as a prime service that can be reduced. A smart shop backs up as many virtual machines as they can using storage snapshots or virtual machine snapshots and then moves those snapshots to a remote location without the need for tape. That means no more tape pickups, which is a service reduction. Even for those shops who have systems where backups of the data in the guest machine still needs to be completed, there’s a serious reduction in services because there’s a huge reduction in tapes used and stored. There are also faster restore times. Take for example, if a file server falls over due to an OS corruption cause by a conflicting set patches – restore from the snapshot, and your in business. No call for tape, no waiting for delivery, and only minimal downtime. This is just one area where services are reduced, yet greater service is provided. Others include provisioning new servers, which in a large environment is time-consuming and costly. Replacing dozens of servers sitting cold in a DR facility with a few hefty virtualized systems can reduce physical storage costs just in terms of rack space and square footage. Needless to say, the calculations for this vary from shop to shop, and you will have to find your own service reduction ROI points. Some places to look:
- Reducing tapes
- Reducing tape and DR facility storage fees
- Decreasing time and personnel costs to prepare new infrastructure
- Decreasing hardware support / warranty contracts
Variable 5) Service Increases
Availability comes to mind here – no more worrying about hardware failures requiring a huge restore window means a huge bump in availability numbers. In the case of DR, there’s most likley a pre-determined cost per picosecond of business downtime – that figure is just ripe for plucking into an ROI calculation (albeit on a seperate line), because with tools like VMware’s VMotion, HA, and DRS, the time-to-recover from failures is drastically reduced. This means that the company is losing less money due to an outage, and therefore each tracked outage can be tallied up and compated to the pre-virtualization outages, yielding a good source of ROI from loss-aversion.
That’s the positive part of ROI – remember that ROI comes with a built-in double-edged sword – some costs will go up. In the services arena, you will pay more for the increased networking required for good remote DR. In the training arena, you will pay more for virtualization training. In salaries, you will pay more for virtualization experts. The list goes on. The “trick” of ROI is in being complete – finding all of the increases and decreases in costs that virtualization brings. I’m willing to bet that any environment with more than ten servers will get positive ROI in less than a year. The long and short of doing an ROI analysis is this – it’s a long, involved process that won’t give real numbers worth a darn if you don’t take the time to analyze your entire business-technology environment for the correct numbers. Claiming a positive ROI by server consolidation alone is a great win, but not at the cost of missing other aspects of your business’ ROI. To sum up, look at the following for sources of ROI:
- Hardware Costs
- Software Costs
- Physical Storage Costs
- Downtime Costs (averaged w/ equal periods, pre-project)
- Consumables Costs (tapes)
- Chargeback Costs
- Salary and Benefits Costs
- Training Costs
- Consultant Costs
- Energy Costs
Put these into two main columns, what you spent on the project and in production post-project and what you saved from pre-project expeditures. Adjust for inflation and print.
The VMware Certified Professional exam isn’t the easiest exam in the world, nor the hardest (many diverge on whether Red Hat, Citrix, or Cisco win that title for their premier certification levels), but it’s rapidly becoming one of the most sought-after enterprise certifications available. Since I’m still prepping for the test, and haven’t sat for it, I’m a little leery of making that statement, but I’ve sat through enough CBTs that are acclaimed for their similarity to the exam that I feel safe putting it out there, and I’ve taken the only class that VMware has geared towards the VCP exam. The question is why… Why is the VCP so valuable?
Because it’s a hot technology in the truest sense of the word, and one that isn’t likely to go away anytime soon. The story is slightly reminiscent of another darling of Wall Street, Citrix Systems, which appeared to fill in a void left by the need to remotely access applications and which has consistently improved its product lines and scope of business over the years. Interestingly enough, Citrix and VMware are about to go head to head in the market for virtualization customers following the CItrix / XenSource deal, and Citrix has always had a strong certification program (perhaps we’ll see some new certs, maybe CCXA – Citrix Certified Xen Admin or maybe CCVA – Citrix Certified Virtualization Admin). VMware, like Citrix, has made its mark with a single product (ESX), and then branched out to add more and more to the product line, always ensuring that there was a clear line of sight back from all of the added products to their main line (aka their “Core Competency” for those paradigm-shifting process re-engineering, value-adding self-empowered framework info-architects out there). This focus will make VMware an adopted technology almost everywhere, and another indicator of VMware’s “hotness” are their impressive number of customers. VMware claims 100% of the Fortune 100 and another 20,000 enterprise-level customers. That’s an impressive stock of customers in a relatively short lifespan.
Look no further than the IPO for proof that VMware is hot:
What do these numbers mean when you put them together? People are using the product at all levels of business, and business is expanding into new market spaces, new market locations, and new customers. This means jobs are open for qualified staff, contracts are available for qualified consultants, and in both cases, money is there to be made. Like the MCSE when Windows NT4 debuted, the CNE when Novell 4 debuted, and all of the other business-transforming certifications, having the VCP is ticket to a higher salary. In fact, the VCP is one of the hottest salary items out there, judging from this comparison I made using the tools at indeed.com (in US dollars, covering the US market):
Looking at those numbers, I’m astonished… a single-test exam cert beats out all but one of the other infrastructure certs in that list, including the Microsoft Certified Systems Engineer (seven exams at minimum), the Red Hat Certified Engineer (three exams, one of which is an on-site, hands-on lab), and the CCEA (five exams, one of which is also a hands-on lab). This is the market in a nutshell – the growth of virtualization is fueling a need, one that exceeds the need for other certifications, and so demand is driving up the earnings potential of VCPs just as much as the expertise level of the cert, if not more. In the long run, it will fade back in with the pack, but by then VMware will have undoubtedly expanded their certification program to include the typical technician/sysadmin/architect tier that most certification tracks fit into. The long and short of why the VMware’s certifications will remain higher than other certs is because all other non-hardware, IT-related products can run on top of VMware, inside guest machines. This in turn means that at the basic level, VMware will be more important than them all because it’s the root of the tree. Trained, certified people will command bigger salaries because VMware’s root-level position means it will also be the top of the food chain – hardware / virtualization / operating system / application.
This isn’t just a US trend, there are numbers from many countries as well. I qualified the chart above by it’s location because there’s also this graph from itjobswatch.co.uk that I’d like to share:
Want to travel the world? Get VMware certified and go to work for the right company!
The really great thing about the VCP is that it’s cross-disciplinary – you have to have (or learn) some good all-around IT knowledge to pass the exam. Much like Citrix’s exams, you need to know more than just the core product. For the Citrix CCA and up it’s takes Windows, networking, and Citrix knowledge. For the VCP exam you need to know a little bit about operating system administration, hardware configuration, shared storage, and networking in order to pass, because each of these plays a role inside the Virtual Infrastructure platform. Personally, I can’t wait to take the exam – it’s not going to have one little bit of impact on my current position or any probably future positions (I’m at the top of the food chain in my day job), but I want it for the same reason I hold a CCA – at heart I’m a generalist who loves to know how things work (the official line is “so that I can understand their impact on the business and their relevance to meeting the shared goals of our company’s mission”, but I have to admit, I also enjoy the learning and the tinkering for their own sakes).
It’s not just a certification, like the slew of them we can all put on out resumes to impress the next interviewer, it’s a career-enhancing move. Does the certification make you any more of an expert than you may or may not be? That’d be debatable on a case-by-case, person-by-person basis, but it’s definitely a mark that you know what’s hot, and aren’t “stale”. The hotness-factor of the VCP is high, the demand is there in the market, and the value a VCP can add to your career makes it worth the time and effort to earn.
How many of you believe that Apple’s 1.1.1 iPhone update accidentally bricked modded iPhones? Personally, I try to air on the side of optimism, but there are certainly many people out there that think Apple intentionally went after those individuals who took it upon themselves to jailbreak and unlock their shiny gadget-of-the-moment.
Here we are again, not even a month later, and the new Linux Kernel, 2.6.23 was released on 2007/10/09. The latest product of the world’s greatest hackers includes a bevy of new features, including increased support for Xen and KVM and two open source virtualization solutions. Users of those products are probably very happy today, eagerly awaiting the adoption of the new kernel by their favorite distribution in order to take advantage of the increased guest support that comes with it.
VMware Server users on the other hand are getting the proverbial shaft. Kernel 2.6.23 has one MAJOR change and one minor change that completely break VMware Server.
For purposes of dramatic effect, I will detail the minor change first. VMware Server inserts a driver module into the kernel called vmnet. It provides magical networking gnomes that help shuffle bits in and out of VMs to the wide world of webs. In one of its source files, driver.c on line 522, the vmnet driver makes a function call to “unregister_chrdev”, a function defined in the Kernel source file “fs/char_dev.c”. Prior to Kernel 2.6.23 the function “unregister_chrdev” returned an integer value; a return value that the vmnet driver keys on in order to determine whether or not to issue a warning. Kernel 2.6.23 changes the function signature of “unregister_chrdev” to return void instead of and integer. This really hoses the vmnet module source file since it expects an integer value to be returned, and thus the vmnet module will not compile when the “vmware-config.pl” script is run. Luckily there is an easy fix. It seems that the function “unregister_chrdev” has actually returned a value of “0” despite what transpires in the function as far back as 2.6.20, a Kernel that VMware Server runs fine on. Thus the easy fix is to just edit the vmnet driver.c source file and re-run the VMware Server configuration script.
That is the minor problem that the new Kernel creates.
The major problem is a bit more cumbersome, since the fix involves either redacting a change that Linus (Torvalds) has approved for the 2.6.23 Kernel or lying and declaring that the vmmon module is GPL licensed.
But I’m getting ahead of myself. Let’s start at the beginning. A memory structure called mm_struct is defined in a Linux Kernel header file “linux/sched.h”. Prior to 2.6.23 this structure included a field called “dumpable” that would determine how memory was dumped, securely or not. Kernel 2.6.23 removes this field and lets two functions defined in “fs/exec.c” take its place: set_dumpable and get_dumpable. VMware Server uses the dumpable property in its memory management module vmmon: in the file driver.c to be exact. Since the dumpable property is no longer in the 2.6.23 kernel the vmmon module will not compile.
One might think that a quick fix would be to simply edit the vmmon source file to use the new set_dumpable function. In fact, this action will result in a vmmon module that compiles; however, it will not insert into the Kernel, and an error will occur that says the module contains an unknown symbol. A quick check of dmesg reveals that the unknown symbol is indeed set_dumpable. ‘What, what, whattttt,” you say. But the set_dumpable symbol IS in the kernel. That is verifiable by peeking in /proc/kallsyms.
Heh, heh. Hold on to your seats. This is where it gets fun.
The function set_dumpable is exported in 2.6.23 with the new EXPORT_SYMBOL_GPL, meaning that only modules that are GPL licensed can use it. More can be read about this decision on the Kernel mailing list.
VMware Server’s vmmon module cannot use set_dumpable because it is not GPL licensed. There are two solutions to this problem. The first solution is to edit the Kernel source file “fs/exec.c” so that “set_dumpable” is exported with EXPORT_SYMBOL instead of EXPORT_SYMBOL_GPL and compile a custom Kernel. Then, the vmmon module source file “driver.c” still needs to be edited such that the “dumpable” property is no longer used in favor of “set_dumpable”. The second solution is to edit the vmmon module source file the same way as in the first solution, but also using the macro “MODULE_LICENSE” to indicate that the vmmon module is licensed under the GPL.
Neither solution is nice, because the first one involves maintaining a custom Kernel and custom vmmon module, and the second solution involves changing the vmmon module license without permission. A long-term solution is needed where either the Kernel developers change set_dumpable to be exported out from underneath the aegis of the GPL, or VMware could license the vmmon module under the GPL or create some type of GPL-compatible shim module that in turn calls the proprietary code in vmmon.
Perhaps most interesting of all is the timing. The same Kernel that provides extended support for Xen and KVM also breaks VMware Server. Coincidence? Like I said, I try to err on the side of optimism. How about you?
Last week, I wrote a story about Sun’s upcoming xVM virtualization offerings, and in that story, I quoted Sun director of Solaris marketing Dan Roberts as saying that Microsoft does not officially support Windows or its applications running as guests under VMware. “There is no official support,” he unequivocally said.
Not so, countered VMware, pointing to a couple of Microsoft Knowledge Base articles on Microsoft’s support site. For example, one such article says “For Microsoft customers who have a Premier-level support agreement, Microsoft will use commercially reasonable efforts to investigate potential issues with Microsoft software running in conjunction with non-Microsoft hardware virtualization software,” a VMware spokesperson pointed out.
However, the KB article itself also explicity states that customers without Premier level support enjoy no such assurances. For those shops, “Microsoft will require the issue to be reproduced independently from the non-Microsoft hardware virtualization software. Where the issue is confirmed to be unrelated to the non-Microsoft hardware virtualization software, Microsoft will support its software in a manner that is consistent with support provided when that software is not running in conjunction with non-Microsoft hardware virtualization software.”
In other words, Microsoft supports its software running on third-party virtualization software, but not really. More to the point, whereas it may be making “commercially reasonable efforts” at support today, will it always? A “commercially reasonable effort” is a very subjective notion indeed — one which may change dramatically the closer we get to a shipping version of Microsoft’s own virtualization platform, Viridian. Or is that too cynical of me?
Personally, I’m curious to hear how support for Microsoft OSes and apps is playing out in your VMware shop. How has it changed over the months and years? Do you worry that Microsoft will use support as the stick to get people to switch to Viridian? If and when they do, what would it take for you to go along? Feel free to leave a comment, or if you’d rather respond in private, send me an email.
Some time back, before I was invited on as a blogger for SSV, I was interviewed by the always-fun-to-work-with Adam Trujillo about Virtualization in the Data Center, and, like all good writers, Adam left the best question for last:
“What about hardware decisions — should data center managers be considering scale-up instead of scale-out?”
My response was:
“I personally prefer a scaled-up approach because there is a reduction in ongoing costs, such as power, space, cooling, and physical maintenance. Also, the complexity factor is reduced when there is less hardware to manage. An exception to that would be data centers without existing centralized storage — the initial acquisition becomes more expensive in scale-up operations if a SAN infrastructure is not already in place.”
I’m guilty of being one of those people that says “Durnit, why didn’t I say this or that?” or “Dangit, why didn’t I quantify that a little more?” even well after the fact, making me perhaps my own worst critic. In this case, I really felt I left some stuff unsaid. One item that irks me about that answer is that I should have made more mention of blades. I hate blades in their current incarnation. I think they’re the worst idea in IT – they’re hot, cramped, delicate, with slower components and limited expansion ports – if you name something about a blade, I can find a reason to hate it. That said, I shouldn’t have left them out of my line of thought – a good IT Manager needs to consider uncomfortable things, difficult things, even distasteful things, when looking at something impactful. Or so says the wisdom of Frank Hayes, to whose articles I often find myself nodding to the affirmative while reading. So, here goes.
Blades are hot – they have limited cooling options built-in. That’s often a “value-add” (choke) of specialized rack systems and chassis systems provided by third-party vendors. Here’s a few links to illustrate the point:
- Power and cooling woes undercut blade server benefits
- IBM feels the heat
- Heat relief for data centers using blades
- Concerns heat up over keeping blades cool
A rack of big-honkin’ boxes will make you feel toasty on the parts next to their fans. A rack of blades will cook you medium-well given enough time. To prevent the data equivalent of multiple mini-supernovas you need to install the correct cooling – the correct tonnage of AC, hot and cold rack aisles, proper ventilation, air temperature monitors, system heat monitors, etc. In many data centers, the cost of new construction (or re-construction) may very well exceed even long-term cost savings from server consolidation, and even if you can afford the construction and still come out with positive ROI, that cooling comes at a monthly utility cost – you must increase your power consumption to keep things cool.
That said, this is where virtualization has been proven out over the last decade as a way decrease the number of servers and offload them to blades. That may mean that you can remove enough servers to use your existing heat management systems in a more focussed way and not have to break the bank. Even if it’s a five-to-one ratio of servers removed to virtualization-equipped blades added, you’re coming out ahead. Add in centralized storage systems to connect to the blades and the scales may well tip back in favor of Mr. Heat Miser again, but probably not. Getting a ten-to-one ratio means blades are a winner. This is assuming a large server consolidation via virtualization project. If it’s not a big percentage of your boxes being affected, you’ll be back in the hot seat, quite literally.
Ever need five or more NICs for a virtualization host? I have. If I had blades, I’d be using three blades to get that done, assuming dual nics, and five or more on single-nic blades. That means more blades, more virtualization software licenses I don’t need, more hardware to fail, and more physical boxes when what I want to do is REDUCE the number of physical boxes. Right now server blades are still too young – many vendor’s products have all the components are included on the blade, and not modular enough. PC blade systems have it a little better – some limited peripheral connectivity at the user-site (see this link for one manufacturer’s solution), but still, it’s an entire box in a chassis with all the difficulties of expanding that micro-sized PCs and laptops have.
So, I think it’s safe to say that I still hate traditional blades. But I think they’ll be the saviour of the data center soon, and then I will love them. Why? Because here’s my ideal blade system: a truly modular system that will change everything about blades. The best part, it’s available now from several of the larger vendors. The changes are part of a new design “paradigm” (please note my bias against that word) – the end-result is a blade system where the blades can be NICs or other devices, as needed and plugged into the chassis, connected in either a physical layer with ye olde jumper or a software layer (in the chassis management software, perhaps). Lets say I get a blade and I need to put ESX on it, but I need six NICs because of guest system network i/o requirements… ok, I get another blade with a quad-NIC on it, plug it into the chassis, and configure it – voila, a single computer with five or six NICs in two blade slots, using one license. Or perhaps I need ten USB connectors for some virtualized CAD desktops, which require USB key fobs in order to use the CAD software – I plug in a server blade and a USB blade, configure it, and voila, one server, ten USB ports, one license. Expand that out far enough, and you can have whatever you need in terms of peripherals in a blade chassis. If you go to IBM’s website, you get a whole panopoly of choices – switchblades (that one always give me a chuckle) and NIC blades are readily available for expanding your blade chassis out to do more than just host some servers. HP upstages them a bit and has a great product out now that provides PCI-X and PCI-e ports. This is from their website:
“Provides PCI-X or PCI-e expansion slots for c-Class blade server in an adjacent enclosure bay.
- Each PCI Expansion Blade can hold one or two PCI-X cards( 3.3V or universal) ; or one or two PCI-e cards(x1, x4, or x8)
- Installed PCI-X cards must use less than 25 watts per card. Installed PCIe cards must use less than 75 watts per PCIe slot, or a single PCIe card can use up to 150 watts, with a special power connector enabled on the PCI Expansion blade.
- Supports typical third-party (non-HP) PCI cards, such as SSL or XML accelerator cards, VOIP cards, special purpose telecommunications cards, and some graphic acceleration cards.”
This is interesting – a couple of PCI-e quad-NICs in one of an expansion unit and my NIC requirements are set. Or perhaps a couple of PCI-e USB add-in cards. Or a high-end PCI-X or PCI-e video card. Ok that gets troublesome when you need a lot of them – you can wind up with one blade and a chassis full of expansion slits containing video cards – the cost might not be worth it.
In any case, this dramatically changes my view on scaling up or out. Right now, I still stand for scaling up because blades don’t work in my enviornment – I have heat problems. I have space problems too, which blades could solve, but not with my heat problems. I prefer to buy larger-sized servers with lots of expandability (DL300 and 500 series, PowerEdge 2000 and 6000 series, etc.) and add in NICs as needed rather than buy blades or 1U boxes because I can do more with these larger-sized machines even though they take up more room. I fully expect that to change in the future – at some point I see myself stopping with the scaling up and starting with the scaling out – only I expect the “out” part of that will involve a lot less real estate and more options than currently available.
SearchServerVirtualization.com is now soliciting nominations for its Products of the Year awards. We invite you to nominate your favorite product or your company’s product by using the form at the entry page. Winning products will be featured in January 2008 on SearchServerVirtualization.com.
SearchServerVirtualization.com staff and other industry experts will judge the entries. Your product(s) qualify for submission if they have shipped (or have been significantly upgraded) between October 31, 2006 and before November 1, 2007.
If you are submitting more than one product, you must fill out a separate form for each product.
Note: Products entered for Best of VMworld awards are eligible for entry. This is an entirely separate award.
The deadline for all submissions is November 2, 2007.
Products are limited to one category. They must fit into one of the following categories for consideration:
-Data protection (Including backup, replication, HA and FT products)
-Systems management: Monitoring and reporting
-Hardware for virtualization (Including, but not limited to: Servers, storage, I/O components and client devices).
-Virtualization platforms (e.g. VMware ESX, VI3, Microsoft Virtual Server, Virtuozzo, XenEnterprise, etc.)
We’ve identified the following criteria as being most important, and will judge accordingly:
* New or upgraded features and capabilities
* If the product is an upgrade, how the upgrade has affected sales and user adoption, and
* User reviews.
Bloggers, feel free to mention this in your own blog to spread the word!
The need to hire qualified staff to design, implement and manage virtualized environments is growing, and that means hiring managers are having to shift focus towards this distruptive technology and be ready with good interview questions for their prospective hires.
1. Do you have experience in (VMware/Xen/Virtual Iron/Virtuozzo) implementations?
This is the no-brainer question, and the lead in to the others. If the prospective hire’s answer is no, stop right here, do not proceed past go, do not collect $200. Even a certified candidate may not have any experience, and an inexperienced candidate isn’t one you want for the job, since you probably have staff who would like to learn on the job or be trained, and already have the internal processes and procedure knowledge to edge out the competition from outside.
2. When implementing a virtualization environment, what do you consider the most important feature of the product to ensure overall success of the implementation?
This question is good for sorting out who sees the strategic value of virtualization and who is focussed on the technical aspects. A good answer will cover either failover functions or the ability to reduce costs, and relate how they will benefit the business in technical terms. Neither a techie or a managerial answer is right or wrong, but rather will help you sort the crowd of interviewees into the categories you are looking for.
3 . When you were at WidgetMakers, Inc. you list in your resume that you used VirtualBlahBlahBlah to aid your company in meeting the goal of DoingThisOrThat. Can you share with me what challenges you experienced and how you overcame them?
This is a typical interview question surrounding any product, and it needs to be asked for any product you are hiring somebody to work with.
4. How deep is your understanding of storage systems, and can you share an example of how you used this knowledge in a virtualized environment at WidgetMakers, Inc.?
5. How deep is your understanding of network switching, and can you tell me how you would use virtual switches in a broad virtualization implementation?
Cross-disciplinary skills are crucial for virtualization, particularly around storage. Many larger companies have storage administration teams, server administration teams, network administration teams. Being able to work with these groups doesn’t mean that the candidate can work with the technology, and its important that, if the position is technical, that they can do both.
6. Tell me about how you would configure a virtual environment to best take advantage of its features in a backup and disaster recovery framework?
Being able to understand how to use DR-friendly features like VMware’s vmotion and backup-friendly features like snapshots can make all the difference in candidate selection. It’s important for a candidate to know how to keep the business running, even if they don’t know the business itself yet.
7. Tell me about VirtualizationProductFeature, and what you think makes it valuable or not valuable.
This gets into the technical understanding of the product, a crucial point in both technical and managerial interviews. If a technical candidate blows this one, they need to go home. If a managerial candidate doesn’t provide a business-oriented answer, they need to go home or consider a technical position.
8. BadThing happens. Tell me how you would troubleshoot the situation and get it resolved.
A typical technical question, and one that should always be asked to both technical and managerial candidates. Managerial candidates may get some leeway in technical minutia, but absolutely must speak about their role as the manager and how they would deal with their technical staff to get the problem resolved. This is also a rinse-and-repeat question that should be asked a couple of times, using different BadThings.
There are also consultant-specific questions to ask, if that’s what you’re looking for. Things like:
1. How many VCPs do you have on your staff?
Until the other companies start with their own certs, the VCP is where the game is at.
2. How many virtualization projects has your company undertaken in the last year?
The default no-brainer.
3. Do you eat your own dog food? By that I mean does your company use the product internally as well as support it?
Also a no-brainer
4. What was your company’s most spectacular failure?
Everyone is going to tall you about their company’s great success. Make them squirm a bit and tell you about how they failed, then then ask:
5. What did you do to correct the situation?
This will tell you what kind of consulting firm you are dealing with. If they can be upfront with these two questions, if the failure wasn’t a show-stopper for your environment, and if they dealt with it right, they get high kudos.
Obviously there are many, many more questions to be asked of potential staff, managers, and consultants, so many that I’d like to encourage people to comment about questions you like to ask, would like to be asked, or think would be important – I’m looking forward to some audience participation!