This week’s VMTN Community Roundtable podcast was about Fault Tolerance (FT). Henry Robinson and Karen Ritter of VMware joined to provide information about the development and future of FT.
Here’s a summary of some interesting details from the podcast, but if you haven’t listened to it yet, I recommend that you check out the recording as it provides a lot of valuable technical information.
- VMware spent a lot of time working with Intel/AMD to refine their physical processors so VMware could implement its vLockstep technology, which replicates non-deterministic transactions between the processors by reproducing the CPU instructions on the other processor. All data is synchronized so there is no loss of data or transactions between the two systems. In the event of a hardware failure you may have an IP packet retransmitted, but there is no interruption in service or data loss.
- Think of the primary and secondary as two same size gears with a chain between them so they always rotate at the same speed. If the secondary gear slows down due to a resource issue on its host, the primary gear will also slow down and vice versa. If the secondary virtual machine (VM) slows down to the point that it is severely impacting the performance of the primary VM, than FT between the two will cease and a new secondary will be found on another host.
- Virtual symmetric multiprocessing (vSMP) support will come in a future release. Trying to keep a single CPU in lockstep between hosts is challenging enough and more development is needed to try and keep multiple CPUs in lockstep between hosts.
- FT does not use a specific CPU feature but requires specific CPU families to function. VLockstep is more of a software solution that relies on some of the underlying functionality of the processors. The software level records the CPU instructions at the VM level and relies on the processor to do so; it has to be very accurate in terms of timing and VMware needed the processors to be modified by Intel and AMD to ensure complete accuracy. The SiteSurvey utility simply looks for certain CPU models and families, but not specific CPU features, to determine if a CPU is compatible with FT. In the future, VMware may update its CPU ID utility to also report if a CPU is FT capable.
- Currently there is a restriction that hosts must be running the same build of ESX/ESXi; this is a hard restriction and cannot be avoided. You can use FT between ESX and ESXi as long as they are the same build. Future releases may allow for hosts to have different builds.
- VMotion is supported on FT-enabled VMs, but you cannot VMotion both VMs at the same time. Storage VMotion is not supported on FT-enabled VMs. FT is compatible with Distributed Resource Scheduler (DRS) but will not automatically move the FT-enabled VMs between hosts to ensure reliability. This may change in a future release of FT.
- You can use FT on a vCenter Server running as a VM as long as it is running with a single vCPU.
- There is no limit to the amount of FT-enabled hosts in a cluster, but you cannot have FT-enabled VMs span clusters. A future release may support FT-enabled VMs spanning clusters.
- There is an API for FT that provides the ability to script certain actions like disabling/enabling FT using PowerShell.
- The requirement for dedicated gigabit network interface cards (NICs) for FT Logging is not a hard requirement but is recommended. You could use a shared NIC for FT Logging for small or dev/test environments. The four FT-enabled VM limit is per host, not per cluster, and is not a hard limit, but is recommended for optimal performance.
- The current version of FT is designed to be used between hosts in the same data center, and is not designed to work over wide area network (WAN) links between data centers due to latency issues and fail over complications between sites. Future versions may be engineered to allow for FT usage between external data centers.
VMware’s FT is first generation technology and will get better as it matures over time. Future releases of FT may include enhancements such as relaxing the build level requirements, support for vSMP VMs, support for backing up an FT-enabled VM with VMware Consolidated Backup and also support for movement of FT-enabled VMs via DRS.