High Availability archives - Virtualization Pro

Virtualization Pro:

High Availability

Sep 24 2009   9:34PM GMT

Master’s guide to VMware Fault Tolerance



Posted by: Eric Siebert
VMware, Fault Tolerance, High Availability

I’ve written about the vSphere’s new Fault Tolerance (FT) feature several times and wanted to put the information together in one blog, as well as include some new information. We’ve broken this guide into several sections as it’s a bit lengthy, so you can skim the witty titles and decide if a section for you, or if you’d rather keep on truckin’ to the next section. But first, if you’d like to check out my previous posts on FT, they are available here:

I. And VMware said, ‘Let there be Fault Tolerance’

Fault Tolerance was introduced as a new feature in vSphere that provided something that was missing in VMware Infrastructure 3 (VI3), the ability to have continuous availability for a virtual machine in case of a host failure. High Availability (HA) was a feature introduced in VI3 to protect against host failures, but it caused the VM to go down for a short period of time while it was restarted on another host. FT takes that to the next level and guarantees the VM stays operational during a host failure by keeping a secondary copy of it running on another host server. If a host fails, the secondary VM becomes the primary VM and a new secondary is created on another functional host. Continued »

Mar 24 2009   6:14PM GMT

VMware HA failure got you down?



Posted by: Edward L. Haletky
Edward L. Haletky, VMware HA, AAM, FT_HOSTS, VMap, VMware, High Availability

Consider yourself lucky if you’ve never gotten the VMware HA message: An error occurred during configuration of the HA Agent on the host. But if you have, you may know that the ways to fix the error are extremely limited. Here is a method that worked for me.

Current methods

The current methods of troubleshooting this issue involve checking that the DNS is working properly, that the FT_HOSTS file in /etc/opt/vmware/aam is properly written for the hosts involved in your VMware Cluster, and disabling and re-enabling VMware HA within the VMware Cluster.

New method

The new method assumes that the VMware HA configuration is somehow at fault. I began to think this was the case when I noticed that the /opt/vmware/aam/ha/VMap process was not terminating on a reset of VMware HA. This process, as seen from the output of ps ax issued from the service console command line interface, should not exist when VMware HA is disabled. However, in my configuration it did exist. I also noticed I had problems reestablishing VMware HA after a recent reboot of a server caused by a faulty UPS. DNS was working, FT_HOSTS looked correct, and disabling and re-enabling VMware HA did no good.

Here are the steps that I followed to fix it:

  1. Log in to the service console of your problem hosts and verify that VMware HA is disabled using: service vmware-aam stop
  2. Ensure there are no VMware HA processes running by using: ps ax | grep aam | grep -v grep
  3. If processes exist, kill them using the Process ID returned by the previous command (first column) as the PID: kill -9 PID
  4. Issue the following command via the service console including the parenthesis: (cd /etc/opt/vmware/aam; mkdir .old; mv * .old; mv .[a-z]* .old)
  5. Using the Virtual Infrastructure Client click on the Host, then the Summary tab, and then Reconfigure for VMware HA.

Viola, VMware HA restarts and works properly! This solution may be seen as overkill as it forces VMware HA to recreate all configuration files. I may have been able to just remove the .vmware_fdport file and also Reconfigure for VMware HA, but I did not try that option. I bring this possibility up as it is NOT there on my now-running VMware HA-enabled hosts.

Now I have what looks to be a fool proof way to get VMware HA to start back up and protect my investment.