The Virtualization Room

Sep 24 2008   8:28AM GMT

VMware defends its upcoming fault-tolerance feature

Bridget Botelho Bridget Botelho Profile: Bridget Botelho

During VMworld 2008 in Las Vegas last week, VMware Inc. announced its upcoming fault tolerance feature and gave a demonstration of it during one of the keynote sessions. It looked pretty good and simple to use, but Littleton, Mass.-based Marathon Technologies Corp., a company that specializes in fault tolerance software, had plenty to say otherwise.

In response to Marathon’s blog dissin’ the upcoming feature, Palo Alto, Calif.-Fencing - UFCbased VMware‘s Mike DePetrillo, a principal systems engineer, wrote a blog defending VMware Fault Tolerance.

For starters, Marathon complained that VMware does not provide component-level fault tolerance. “The most common failures that result in unplanned downtime are component failures such as storage, NIC [network interface card] or controller failures. Yet VMware Fault Tolerance doesn’t do anything to protect against I/O, storage or network failures.”

DePetrillo noted that VMware already has features to protect again component failure. “If your NIC fails you’ve got NIC teaming built into the system. To set it up simply plug in both NICs to the server, go into the network panel and attach both of them to the same virtual switch. Done. Four clicks. Same thing for storage with the built-in SAN [storage area network] multipathing drivers,” DePetrillo wrote. “I absolutely agree with the author that component failures are the cause of most crashes and that’s why VMware added these features in 2002. VMware FT is not designed for component failure because there’s no sense in moving the VM to another host if you’ve simply lost a NIC uplink. NIC teaming will take care of that with ease and is a LOT cheaper than using CPU and memory resources on another host to overcome the failure.”

Marathon’s second beef: VMware’s fault tolerance is too complex. “In order to use VMware Fault Tolerance, you’ll first have to install both VMware HA [High Availability] and DRS [Distributed Resource Scheduler]. No small feat in and of themselves. Then, because VMware FT requires NIC teaming, you’ll also have to manually install paired NICs. Then you’ll need to manually set up dual storage controllers (with the software to manage them) because it requires multipathing. And to top it all off, you’re required to use an expensive, and often complicated, SAN.”

DePetrillo said the process requires checking off two boxes – HA and DRS. That’s it. “If that’s too hard then please comment and let me know how it could possibly be easier. Even my dog has figured out how to do this now. Granted, it’s a pretty smart dog.”

“As for setting up the dual NICs and dual HBAs [host bus adapters], well, yes, you have to actually plug the physical devices in. After you’ve done that the **built-in** NIC teaming and HBA drivers will take over and configure most everything for you. The NIC teaming does require four extra clicks. The HBA drivers actually figure out the failover paths, match them up, and set up the appropriate form of failover all auto-magically. They’ve been doing this since ESX 1.5 (6 years ago),” DePetrillo blogged.

“Lastly, yes, this requires shared storage. Pretty sure that most environments that want FT (no downtime what-so-ever because our business could lose millions) already have a SAN to take advantage of other things virtualization related such as DRS and VMotion,” he wrote.

Also, VMware FT does not require dual NICs or dual HBAs because, DePetrillo said, “This is something you should have in every virtualization setup that’s running VMs you care anything about, but it’s not a requirement to get VMware FT [Fault Tolerance] running.”

The last point Marathon makes that’s worth spending any time on is that VMware  offers onlylimited CPU fault tolerance. “With VMware FT, you’ll need to set up what VMware refers to as a “record/replay” capability on both a primary and secondary server. If something happens to the primary server, the record is stored on the SAN and then restarted on the secondary server. … The whole thing depends on the quality of the SAN. Second, in the words of the VMware engineer who presented at VMworld, “this can take a couple of seconds.” So what happens to your application state in those couple of seconds?”

DePetrillo’s defense is that “if you’re the type of company that requires absolutely no downtime for an app — if the app is just that critical — then I’m pretty sure you’re going to have a decent SAN. … If you’re having so many problems with your SAN that you don’t trust it for FT, then you have much bigger issues at hand that VMware or Marathon or any of the other virtualization related vendors aren’t going to help you with.”

You can read more of VMware’s comments on DePetrillo’s blog, which gets into some details on how VMware Fault Tolerance will work, and vice versa for Marathon.

But I think it is obvious that Marathon is making VMware’s fault tolerance feature seem worse than it is, and VMware is making its new feature seem simpler than it is.

For the most part, this is a pissing contest between the incumbent fault-tolerance vendor and the “new guy,” but the fact of the matter is, if you use VMware virtualization, you can’t use Marathon Technologies because they don’t support VMware (obviously) and if you use Citrix Systems’ XenServer, you can’t use VMware Fault Tolerance, so these arguments are moot.

4  Comments on this Post

 
There was an error processing your information. Please try again later.
Thanks. We'll let you know when a new response is added.
Send me notifications when other members comment.

REGISTER or login:

Forgot Password?
By submitting you agree to receive email from TechTarget and its partners. If you reside outside of the United States, you consent to having your personal data transferred to and processed in the United States. Privacy
  • Bridget Botelho
    It's not VMware's comments, it's Mike's comments... that's something totally different imho. That's why the disclaimer is on the blog.
    0 pointsBadges:
    report
  • Bridget Botelho
    You’re right this looks to me like a “pissing contest” between VMware and Marathon. To your point, one is Xen based, and won’t run VMware, the other is VMware and doesn’t support Xen, sort of a silly argument. What I find interesting is the limitations to these software FT products. I did sit in on the VMware FT session that was held right after Tuesday mornings keynote by Paul Maritz. The room was very large and it was practically full, so there certainly was some interest. And the person who gave the presentation I thought did a great job of not “marketing” this as the answer to every availability problem out there. The fact that there is no release date or no pricing was a bit disconcerting, I have heard a few different dates and those keep shifting (and not in a good way). One of the big issues that you have with software FT is the inability to scale. This is true with the Xen based products and it seems to also be true with VMware. Applications can not span past a single core. This is a “nut” that no one has been able to crack to this point. Now that may not be an issue as long as the application you want to run in FT (mode) doesn’t need more horsepower then a single core can provide. The problem there is that it’s difficult to know this, and worse, we all know that peak times cause spikes and with the overhead that VMware is stating (could be as high as 20% in worst case), performance could become and issue very quickly. Another significant issue with software lock step, is latency. The “roundtrip time” required for tasks like replication, heartbeat, and replay adds to the amount of time required to process a transaction and can not be solved by simply over-configuring a server with additional resources. These software FT solutions also require two (2) servers, each configured with a copy of the software plus duplicate copies of all the guest operating systems and applications that require protection. So ultimately, the need to “double-up” on the amount of hardware and software the IT group needs to purchase and manage is contrary to one of the central reasons virtual environments are deployed in the first place. Add to that, there are inherent requirements for redundancy, networks, and storage in the architecture of VMware FT such that it may require the modification of you configuration, although the application itself does not require any modification. This all begins to add up to complexities and cost that you uncover as you implement what was supposed to be a “cheap & simple” solution.
    0 pointsBadges:
    report
  • Bridget Botelho
    Hi, I have been one of the lucky people who got to try VMware 4.0 beta. I have played with the VMware FT feature and it seems solid and a great addition to a great product in my opinion. If you still in doubt check out the sneak preview & video found at http://www.virtualizationteam.com/virtualization-vmware/vmware-esx-40-ft-fault-tolerant-sneak-peek.html which will get you nudging your head and ready for the release of VMware FT. Enjoy, Virtualization Master
    0 pointsBadges:
    report
  • Bridget Botelho
    Did you hear about VMware FT? I just read a bit about it on http://www.virtualizationteam.com/virtualization-vmware/vmware-esx-40-ft-fault-tolerant-sneak-peek.html Would u think that would replace VMware HA? I had seen a video even of FT on that link. Is it available yet?
    0 pointsBadges:
    report

Forgot Password

No problem! Submit your e-mail address below. We'll send you an e-mail containing your password.

Your password has been sent to: