Posted by: Atrujillo
VMware ESX, VMware High Availability (VMware HA)
Sean Clark is a VMware Certified Professional (VCP) and a member of the Des Moines, Iowa-based Central Iowa Virtualization Users’ Group (CIVUG). The CIVUG emerged from the Central Iowa Linux Users’ Group as a way for virtualization users to learn more without being tied to one vendor. That is, the CIVUG discusses Hyper-V and Citrix Systems’ Xen in addition to VMware.
Clark has cut his teeth on VMware and other virtualization platforms as a solutions architect at Alliance Technologies, where he works with small and medium—sized businesses and some enterprise businesses to develop strategies to best implement virtualization, be it storage, server or application virtualization. Clark posted notes on a presentation he did at a recent meeting on Red Hat Network File System (NFS) configuration being used with VMotion, High Availability (HA) and Distributed Resource Scheduler (DRS).
NFS vs. Fibre Channel and iSCSI
The NFS configuration how-to Clark followed was posted by Mike Laverick on RTFM Education and was chosen for the demo not only because of it’s easy configuration, but also because implementing Red Hat NFS is a good low-cost alternative to shelling out big cash for more expensive devices for virtual machines (VM) storage. “Most people think that you need a Fibre Channel SAN [storage area network] or an iSCSI SAN; you’ll spend at least 20 or 30 grand on that device. But all you really need is a reasonably new server with some pretty fast discs, and you can basically have a SAN for the cost of your hardware.”
Clark also says that despite what some benchmarking stats say, performance won’t suffer on the cheaper NFS. “If you do your homework, you’ll find that in many benchmarks when you compare the performance of NFS versus iSCSI, they’re almost nearly identical. You can spin your benchmarking tasks so that one will come out on top every time, but for the majority of uses out there, and especially for test and dev, NFS is just as good as iSCSI.”
The storage system lines blur when you scale out to enterprise-level requirements because the needs go beyond the hardware and software backing your VM storage. Clark says that it’s really about having enough discs. “It doesn’t matter if you have NFS, Fibre Channel or iSCSI if you don’t have enough discs to meet the I/O demand that the number of virtual machines can put on a SAN. Most of the time, it becomes almost a religious decision — whatever you’re most comfortable with.”
HA snafu highlights proper network configuration
After walking through the NFS configuration, Clark went on to demo VMotion, HA and DRS, but ran into some problems with the HA portion. At his home in Pella, Iowa, Clark has a small test lab of two 1U servers running two ESX VMs, which he brought to Des Moines for his presentation.
With little time to reconfigure the servers (Clark was asked only a few days prior to the meeting to give the presentation), Clark decided against bringing his PC that he had used as his NFS server in favor of a Red Hat VM running on his laptop, which served as the basis for his presentation. Although both the ESX servers were able to see the Red Hat VM through a little Linksys switch, the demonstration came to a standstill during the HA portion.
The problem? Because the conference room that the VUG meets in is an island network, there is no default gateway address available as there was at Clark’s lab in Pella. An available, ping-able gateway address is required for HA to work. This is because when HA is set up, each ESX server establishes and communicates through a heartbeat and that’s how it determines whether each server is awake or not. If that heartbeat ceases, then HA makes a decision to restart the VMs that are running on a different ESX server. Sometimes the network becomes unavailable, but the other ESX server continues to run.
The isolation address was the pain point in Clark’s HA demo. The default gateway that used to exist in Pella didn’t exist and HA failed. As Clark explains, “It’s just an additional IP address that an ESX server can gain to say, ‘OK, I lost my buddy, but I need to make sure that the network is still up.’ So as long as it can reach that other IP address, then it can assume that the other host is actually truly dead and that it needs to restart its failed VMs.”
Proper planning and redundancy is key to successful deployments
Clark says that he almost never has problems because he is careful to use VMware-supported hardware and plans out each deployment carefully. But not all IT departments are so careful. “I ran into a customer the other day that didn’t want to invest in redundant networking; they had a four-host cluster and all their networking was going through one physical switch. They had HA set up, and when that switch went down, ever single VM that was on one of their ESX servers powered down.”
Clark says that this can happen as a result of not properly planning your configuration. “One of the configuration options for HA [concerns] what to do if an ESX host becomes isolated. Because it lost its network, there wasn’t a second physical switch for redundancy. Each host, even though it was still running, became isolated according to the HA configuration. It was probably a default setting at the time when they set up the HA cluster, and it powered down all the VMs.” Clark says that with proper preparation, the ESX hosts may not have appeared isolated to HA and this organization may have been able to save the headache of restarting all its VMs and troubleshooting what caused the power-down.