Virtualization Pro: A SearchVMware.com blog:

VMware High Availability (VMware HA)

Jul 3 2008   4:01PM GMT

Virtualization virtual tradeshow offers VMware networking opportunities



Posted by: Hannah Drake
Virtualization, Desktop virtualization, VI3, VMotion, VMware ESX, VMware High Availability (VMware HA), VMware Desktop Infrastructure

Trade shows are great – if you have time to attend, have staff to cover while you’re away learning about a new technology, can avoid summons back to the office during the show, can find a show in your local area or can get budget approval to attend a show that requires flight or hotel reservations.

Enter the virtual trade show (VTS); an online conference conceived to mitigate the above challenges. Last week, sister sites SearchDataCenter.com and SearchServerVirtualization.com hosted an advanced enterprise virtualization VTS. I helped staff the networking lounge and editorial booth where I had the opportunity to chat with VMware users about two of the virtualization provider’s newest tools, Site Recovery Manager (SRM) and Storage VMotion.

IM chatting with attendees
Conversations ranged from general IT talk (“Anyone use virtual desktops?”) to small talk (“What’s the weather like in Maine?”). Trying to be the friendly host, I said “Good morning” to the room. I immediately got the reply “Good evening” and was subsequently told this particular user was signed on from –literally–the other side of the world.

I ended up chiming-in on another user’s question about if anyone was familiar with VMware Site Recovery Manager (SRM). The respondent had said that he was, and I ended up asking him about his experience via private IM. SRM orchestrates your virtual machine disaster recovery (DR) plan in the event that your main data center goes down. It prioritizes which virtual machines (VMs) are brought up at the failover site based on available resources, syncs your VM configurations between the main site and the failover site, and allows for DR plan testing without having to take the system offline. It’s a relatively new addition to the VI3 lineup, having been on the market for four months (at the time of publication).

Our conversation turned to plug-ins, and he raved about Andrew Kutz’s Storage VMotion plug-in. The plug-in adds a user interface to the out-of-the-box product, which operates through a command line interface. The attendee explained that he’s primarily a “Windows guy,” so a graphical user interface makes using Storage VMotion much easier.

Kutz recently released an update to the Storage VMotion plug-in.

“The new release now ignores raw device mapping,” Kutz said. “Previously, if you had a raw device that pointed to a 300 Gig disk, the plug-in would look at it as an actual disk and screw up the disk size map.”

He also removed the majority VMware’s internal code from the plug-in (excepting the code that loads the plug-in), replacing it with code based on the VI Toolkit for .NET.

Impressive user interface
The VTS emulates the look of a physical tradeshow floor, which makes navigation a bit friendly, though not as intuitive as I would have liked. You could either move around with the help of a clickable navigation bar, or point-and-click your way from the main entryway to the desired location, be it the conference hall, vendor hall, networking lounge or “library” where you can download PDFs of presentations and various information from vendors, which then moves into your “suitcase,” displayed on your personal page.

VTSs are essentially fancy webcast packages displayed in unconventional ways. In this particular show, the topics were “Protecting your Virtual Environment: Backup and Storage,” “Virtual Infrastructure Automation and High Availability Best Practices” and “Virtual Infrastructure Tuning and Advanced Management.” The speaker was displayed on the left side of the screen presenting his slides via streaming video. The slides were displayed on the right hand side. Users could ask questions via a box at the bottom of the screen.

The VTS, if done correctly, has many more plusses than minuses. As long as there is a reliable Internet connection, there’s no need to leave the data center (if you don’t have a reliable connection in your data center, you might think about leaving for good). The content is almost exactly the same as at a physical trade show (that’s how they got the video of the speaker to begin with). And editorial staff can send IT pros direct links to helpful guides that they know of if an IT pro wants to know about, for example, virtual desktop drawbacks.

If any SearchVMware.com readers passed up the opportunity to “attend” a virtual trade show, I suggest you test it out next time a topic of interest comes around. It’s actually fun to use (think AOL in the 90’s minus the “you’ve got mail”) and offers great learning potential and networking opportunities.

An archived version of the advanced enterprise virtualization virtual trade show is available online, short registration required.

Jul 2 2008   3:27PM GMT

Automatic recovery for failed virtual machines



Posted by: Eric Siebert
Virtualization, VI3, VMware ESX, VMware High Availability (VMware HA)

A little known feature, called Virtual Machine Failure Monitoring (VMFM), was introduced in ESX 3.5. VMFM offers the ability to leverage HA to monitor VMs for operating system failures, such as blue screens, and have them automatically restarted. Previously, HA would only deal with ESX host failures by automatically restarting VMs on alternate ESX hosts in the event of a problem with the host server.

VMFM also extends HA to monitor VMs through a heartbeat sent every second when using VMware Tools. This new feature is disabled by default and is considered ‘experimental’ by VMware. This typically means it works but it is not officially supported for production use yet. In order for this feature to function properly you must first ensure the following conditions exist:

• ESX hosts are version 3.5
• VirtualCenter is version 2.5
• VMware Tools is installed on VMs and is the latest version
• You have a Cluster configured and HA enabled

To enable it follow the below steps:

• Edit the Settings for your Cluster
• Choose VMware HA and click the Advanced Options button
• Add the following Options and Values

das.vmFailoverEnabled – true (true or false)
das.FailureInterval – 30 (declare virtual machine failure if no heartbeat is received for the specified number of seconds)
das.minUptime – 120 (After a virtual machine has been powered on, its heartbeats are allowed to stabilize for the specified number of seconds. This time should include the guest operating system Boot up time)
das.maxFailures – 2 (Maximum number of failures and automated resets allowed for the time that das.maxFailureWindow specifies. If das.maxFailureWindow is ‐1 (no window), das.maxFailures represents the absolute number of failures after which automated response is stopped and further investigation is necessary)
das.maxFailureWindow – 86400 (Either -1 or a value in seconds. If das.maxFailures is set to a number, and that many automated resets have occurred within that specified failure window , automated restarts stop and further investigation is necessary)

I enabled this on a Cluster and tested it by simulating a blue screen on a VM running Windows 2003 Server and it worked perfectly. After 30 seconds the loss of heartbeat was detected and the VM was automatically restarted. Currently there are no notification alerts that can be configured when this occurs. That is, if you check the events for the VM you will see no evidence of this happening. The only mention of it that I found in the logs was in the hostd log on the ESX server ([2008-06-26 11:47:22.552 ‘ha-eventmgr’ 3076440992 info] Event 101 : VM1 on Esx1.xyz.com in ha-datacenter is reset). Hopefully this will change in a later version when the feature is no longer considered ‘experimmental’. You can read more about this new feature in a white paper that VMware has provided on it.


Jun 30 2008   8:51PM GMT

Isolate ESX hosts in separate clusters for maintenance, upgrades



Posted by: Rick Vanover
Virtualization, Rick Vanover, VI3, VMware ESX, VMware High Availability (VMware HA)

Performing key maintenance on a VMware Infrastructure 3 (VI3) host OS or can be an involved process, as can the process of simply adding a host. Because of the length of time that a system may be offline or because of what exactly is being performed, having the hosts in an isolated cluster can allow a safe environment for virtually any task. An isolated cluster will not apply to the same DRS and high availability rules that may apply to the live cluster. Further, you can reboot the host as needed without needing to wonder if there will be any effect to the live workload.

With the isolated cluster, the following tasks can be safely performed:

-Version upgrades (ESX 3.0x to 3.5 Update 1)
-Hardware maintenance
-Adding a new host to an existing cluster
-Testing network connectivity to various port groups
-Confirming VMware HA and DRS performance with in an isolated set of rules
-Importing or configuring new storage types

A good practice for the isolated cluster would be to keep it less visible to the live workload and named accordingly within the VMware Infrastructure Client, so a name like “TestingCluster” or “ZZUpgradeCluster” can distinguish the collection to be different from that of the live workload.

The figure below shows a cluster with a name and position that contains one host in maintenance mode for any task better suited in an isolated environment:

VI3 Cluster

It is important to note that this cluster would still require licensing as it would be in live workload cluster. More information on creating a VI3 cluster can be found in the VI3 online library.


Jun 18 2008   4:22PM GMT

Storage and network planning takeaways from Iowa virtualization user group meeting



Posted by: Adam Trujillo
VMware High Availability (VMware HA), VMware ESX

Sean Clark is a VMware Certified Professional (VCP) and a member of the Des Moines, Iowa-based Central Iowa Virtualization Users’ Group (CIVUG). The CIVUG emerged from the Central Iowa Linux Users’ Group as a way for virtualization users to learn more without being tied to one vendor. That is, the CIVUG discusses Hyper-V and Citrix Systems’ Xen in addition to VMware.

Clark has cut his teeth on VMware and other virtualization platforms as a solutions architect at Alliance Technologies, where he works with small and medium—sized businesses and some enterprise businesses to develop strategies to best implement virtualization, be it storage, server or application virtualization. Clark posted notes on a presentation he did at a recent meeting on Red Hat Network File System (NFS) configuration being used with VMotion, High Availability (HA) and Distributed Resource Scheduler (DRS).

NFS vs. Fibre Channel and iSCSI
The NFS configuration how-to Clark followed was posted by Mike Laverick on RTFM Education and was chosen for the demo not only because of it’s easy configuration, but also because implementing Red Hat NFS is a good low-cost alternative to shelling out big cash for more expensive devices for virtual machines (VM) storage. “Most people think that you need a Fibre Channel SAN [storage area network] or an iSCSI SAN; you’ll spend at least 20 or 30 grand on that device. But all you really need is a reasonably new server with some pretty fast discs, and you can basically have a SAN for the cost of your hardware.”

Clark also says that despite what some benchmarking stats say, performance won’t suffer on the cheaper NFS. “If you do your homework, you’ll find that in many benchmarks when you compare the performance of NFS versus iSCSI, they’re almost nearly identical. You can spin your benchmarking tasks so that one will come out on top every time, but for the majority of uses out there, and especially for test and dev, NFS is just as good as iSCSI.”

The storage system lines blur when you scale out to enterprise-level requirements because the needs go beyond the hardware and software backing your VM storage. Clark says that it’s really about having enough discs. “It doesn’t matter if you have NFS, Fibre Channel or iSCSI if you don’t have enough discs to meet the I/O demand that the number of virtual machines can put on a SAN. Most of the time, it becomes almost a religious decision — whatever you’re most comfortable with.”

HA snafu highlights proper network configuration
After walking through the NFS configuration, Clark went on to demo VMotion, HA and DRS, but ran into some problems with the HA portion. At his home in Pella, Iowa, Clark has a small test lab of two 1U servers running two ESX VMs, which he brought to Des Moines for his presentation.

With little time to reconfigure the servers (Clark was asked only a few days prior to the meeting to give the presentation), Clark decided against bringing his PC that he had used as his NFS server in favor of a Red Hat VM running on his laptop, which served as the basis for his presentation. Although both the ESX servers were able to see the Red Hat VM through a little Linksys switch, the demonstration came to a standstill during the HA portion.

The problem? Because the conference room that the VUG meets in is an island network, there is no default gateway address available as there was at Clark’s lab in Pella. An available, ping-able gateway address is required for HA to work. This is because when HA is set up, each ESX server establishes and communicates through a heartbeat and that’s how it determines whether each server is awake or not. If that heartbeat ceases, then HA makes a decision to restart the VMs that are running on a different ESX server. Sometimes the network becomes unavailable, but the other ESX server continues to run.

The isolation address was the pain point in Clark’s HA demo. The default gateway that used to exist in Pella didn’t exist and HA failed. As Clark explains, “It’s just an additional IP address that an ESX server can gain to say, ‘OK, I lost my buddy, but I need to make sure that the network is still up.’ So as long as it can reach that other IP address, then it can assume that the other host is actually truly dead and that it needs to restart its failed VMs.”

Proper planning and redundancy is key to successful deployments
Clark says that he almost never has problems because he is careful to use VMware-supported hardware and plans out each deployment carefully. But not all IT departments are so careful. “I ran into a customer the other day that didn’t want to invest in redundant networking; they had a four-host cluster and all their networking was going through one physical switch. They had HA set up, and when that switch went down, ever single VM that was on one of their ESX servers powered down.”

Clark says that this can happen as a result of not properly planning your configuration. “One of the configuration options for HA [concerns] what to do if an ESX host becomes isolated. Because it lost its network, there wasn’t a second physical switch for redundancy. Each host, even though it was still running, became isolated according to the HA configuration. It was probably a default setting at the time when they set up the HA cluster, and it powered down all the VMs.” Clark says that with proper preparation, the ESX hosts may not have appeared isolated to HA and this organization may have been able to save the headache of restarting all its VMs and troubleshooting what caused the power-down.


May 26 2008   1:51AM GMT

Friends don’t let friends, like VMware, act like Google



Posted by: Schley Andrew Kutz
Andrew Kutz, Virtualization, VI3, VMware ESX, VMware High Availability (VMware HA)

I like VMware. I like Google. Heck, both of them keep me more than busy with development ideas. But I have a problem with them. Google started it with Gmail. Although it is hard to remember now, Gmail was in beta forever. Oh wait, it still is? Huh. I guess I just figured it *must* have hit production by now. Then there is Google News, Google Apps, Google Page Creator, Google everything else — all beta . I am honestly surprised search hits don’t come back with the “beta” tag next to them. I guess they thought ICQ was the cat’s meow, and that the whole beta thing had a nice ring to it.

Enter VMware, which is perilously close to become the next Google in terms of heavily pushing new features, but then labeling them as beta or experimental. Take for example Storage VMotion (SVMotion). VMware played up this new feature to VI 3.5 last fall at their North American VMworld conference, but when it was release there was no graphical user interface (GUI) option for it. How is that ready for prime-time? And then there is virtual machine (VM) high availability (HA),  another marketed feature that is so experimental you have to edit an advanced setting (as a free-form string) just to enable the functionality.

I wouldn’t actually have a problem with VMware doing this if they didn’t market the heck out these new “features.” Excuse me for being old fashioned, but it isn’t enterprise-ready if it is beta or labeled experimental. And VMware makes no bones about this; they plainly state that these features should not be used in production. However, on the other hand they make a big show about the same set of features, whipping the crowd to a fever pitch of excitement. You can’t have it both ways, guys.

Take VMware Fusion 2 or VMware Server 2. These products are in beta stages right now and VMware is not making a big deal about them. Sure, they are out there for people to get, but VMware isn’t throwing them at customers, not the way they revolved last year’s North American and this year’s European VMworld conferences on features that were not even ready for production.

Then there is the other end of the spectrum as well. I recently discovered that VMware is strategically hiding a long sought feature of ESX in the bowels of its software development kit (SDK). Since version 2.5 of the SDK (VI 3.5), VMware has included the ability (although it does not yet appear to be working correctly) to create network address translation (NAT) and dynamic host control protocol (DHCP) devices directly on ESX servers for VMs to use. This is awesome! Prior to this, the only way to create NATd networks on an ESX host was to dual home a VM to a public and private port group, have it act as the NAT and DHCP server, and then attach other VMs to the same network as its private interface. This solution was cumbersome and did not work well when VMotioning VMs. If I was VMware, I would make a little bit more noise about the fact that they are working on this feature.

I want to reiterate that I like, if not love, VMware. I just hate getting jazzed about a new feature that they have thrown at me, only to find out that it is a curve ball. VMware needs to make sure that features that are experimental should be announced with an asterisk next to their headline, while at the same time working a little harder to ensure that some other upcoming features get the love they deserve.


May 8 2008   8:41PM GMT

Why you should upgrade to VI3 Update 1



Posted by: Rick Vanover
Rick Vanover, VMware ESX, VMware High Availability (VMware HA), VI3, VMware Converter

VMware Infrastructure 3 Update 1, made available on April 10 2008, introduces some core updates to ESX Server 3.5, VirtualCenter 2.5, and the VMware Infrastructure Management Installer. The biggest reason to upgrade, however, is the inclusion of Storage VMotion.

Among the core features now available with ESX 3.5 Update 1 are the addition of the Intel 82598 10 GB Ethernet controller, support of Jumbo Frames and NetQueue, additional Microsoft Clustering Services (MSCS) support, additional backup product and management agent support, additional guest operating systems, and additional server models.

I’ve been working with ESX 3.5 Update 1 for a few weeks, and the installation and behavior are indistinguishable from both ESX 3.5’s base release and ESX 3.02, with the exception of context sensitive tasks or options.

When I test upgrades, I make a point to test the upgrade in an environment with dissimilar ESX host server releases. For example, most of my hosts are ESX 3.02. When I upgrade the first one to ESX 3.5, I want to make sure that nothing goes wrong. I want to know that I’ll be able to sustain a mixed environment with all functionality. When I migrate running virtual machines through host-based VMotion to the ESX 3.5 host, and the reverse, I want to make sure to the best of my ability that nothing will fail. I also want to ensure that all of the VMware DRS and VMware High Availability rules are still enforceable with the mixed-host inventory.

Outlining a functionality matrix and the verification of the behavior is key to having no surprises during a live upgrade. Testing the update to VirtualCenter is a little more difficult but I am setting up a test environment soon to ensure that everything functions as expected in my environment. Overall, the fixes and new features make ESX 3.5 Update 1 an attractive upgrade for systems that are not there already.


Mar 4 2008   11:23PM GMT

Coming soon: Application High Availability (AHA)



Posted by: Schley Andrew Kutz
Virtualization, Andrew Kutz, VMware High Availability (VMware HA)

Scott Lowe recently blogged about things that he loves but are not yet available. VMware, it seems, is getting as bad as Apple in announcing products far ahead of their release dates. This generates great market buzz and sends stock prices soaring (or in the case of VMworld Europe leaving them as low as they have been), but it does little to satiate the desires of systems administrators, it only serves to increase want and provide little respite. Well, then I am about to send you into the Sahara with the promise of an oasis, only to find a sign-post that says “Coming Soon!” at the end of your journey.

Please join me in anticipation of VMware Application High Availability (AHA).

AHA has not been announced by VMware, but it is coming. How do I know this? It is next logical step for their HA portfolio. ESX 3 brought us server HA. 3.5 introduced us to VM HA. And the recent announcement of the VMsafe all but secures the eventual release of AHA.

What is AHA? Simply put, you will be able to right-click on a VM from the VI client and indicate that if Microsoft Exchange fails then the Exchange service should be restarted, or the VM should be failed over to another ESX server. Or perhaps you want to monitor the Apache web server — just check a box. How will VMware achieve this level of fine-grained control? Allow me to refer to the VMsafe product page:

Process execution
VMsafe provided in-guest, in-process APIs that enable complete monitoring and control of process execution.

This API will allow VMware to monitor and control processes within the guest OS. That, my friends, is how AHA will work.

AHA will allow VMware to take even more market share away from companies like Sun and Microsoft, both who have their own clustering technologies. Why cluster at the OS-specific level when you can create clusters the same way no matter the OS or application underneath!

VMsafe is set to debut later this year, and I am quite certain that alongside VMsafe, or shortly thereafter, we will see VMware announcing and releasing its application level high availability software. I hope you aren’t too parched : )


Feb 27 2008   3:17PM GMT

Quickly adding VirtualCenter 2.5 network redundancy



Posted by: Rick Vanover
Rick Vanover, VMware ESX, VMware High Availability (VMware HA), VI3

When VirtualCenter (VC) 2.5 was released, I, like many others, started on a path to migrate to the new version for my VMware implementation. After the VC installation, my ESX hosts, which had only one network interface, displayed a message similar to the one shown below:

Lack of redundancy issue

Initially, I was somewhat irritated at this message. I had already planned out my connectivity for the ESX hosts in the VC 2.0 and ESX 3.02 version behavior. But after some thought I determined that management network redundancy is actually a good idea despite the slight hassle. Here is what I did to quickly and solidly get rid of this message and the corresponding yellow indicator on the cluster:

Get an additional IP address

An additional TCP/IP address is required to resolve the lack of redundancy for the role of VMware service console. In my environment, it worked best to have this additional IP on the same VLAN as the primary VMware service console address. Furthermore, I have one DNS entry for the ESX host that I will leave configured for the primary interface. This is unless there is an issue requiring me to have all service console traffic migrated to this secondary interface. In that scenario, I would also change the DNS entry.

Stack the roles

I chose to “stack” the role of service console on top of an existing vSwitch that was, up to this point, configured to only provide virtual machine traffic. Here are the steps to do this:

  • go to the ESX host in the VMware Infrastructure Client, (VIC)
  • selected the configuration tab,
  • In the networking section, select Add Networking
  • and add the Service Console role on top of the existing vSwitch as shown below:

Reconfiguration

This interface will not have significant traffic back to the VC server unless it is configured to be the primary interface. In my case, the DNS entry will still point to the primary interface on vmnic0. In this configuration, I am not taking precious bandwidth from the virtual machines (not shown) on the PROD-VLAN network across the two physical interfaces vmnic1 and vmnic2.

In general, I do not wish to stack roles on physical adapters. My initial design was based on having dedicated interfaces for virtual machines, vMotion, service console and a hardware management interface (Dell DRAC or HP iLO). In this situation, there is virtually no traffic on the new service console, and the benefit of having the degraded cluster condition cleared is work changing the practice of stacking network roles.

In fulfilling the redundancy requirement, a true issue that would cause the cluster to be in a degraded state with the yellow icon would not be masked. The other, more intuitive option is to add or allocate a physical interface with the specific role of the additional service console assigned. Most implementations, however, don’t have extra physical network interfaces available.

Feel free to share your own strategies in addressing this annoying growing pain of VC 2.5 by commenting.


Jan 16 2008   8:42PM GMT

ESX 3.5: Configuring for HA errors



Posted by: Rick Vanover
Virtualization, Rick Vanover, VMware ESX, VMware High Availability (VMware HA), SQL, VI3

I recently upgraded to ESX 3.5 on a test system, and had an issue that was really stressing me out. The issue was that each time I would perform the “Reconfigure for HA” task, I had errors causing the task to fail and the new ESX host sits there with a red triangle like a car broken down on the side of the highway. The log message that runs in the Virtual Infrastructure Client was largely useless, so I jump into the VMware ESX database. For this situation, I have looked into the VPX_EVENT table and saw the following event:

Event_Type
vim.event.HostShortNameToIpFailedEvent

Host_Name
ESX-35-DEV-001.AMCS.TLD

This message gave me a starting point and I found that in my service console network configuration the DNS suffix order was not the same for all hosts. Specifically, I forgot one DNS suffix in the order and that name made sense to me:

DNS Configuration

Therfore, the takeaway is when building ESX servers, ensure the configuration for all hosts in a cluster (or datacenter) within Virtual Center have the correct configuration where relevant for DNS, subnet mask, interface naming, and storage configuration.


Dec 11 2007   9:51PM GMT

Can VMware HA give 100% uptime for a database?



Posted by: Jan Stafford
VMware High Availability (VMware HA), Virtualization

Systems admin Michael Gildersleeve is evaluating VMware for High Availability (VMware HA); but he’s not sure if that product is going to work well with his legacy software. He’s not sure, either, if HA is as mature and robust as other products on the market.

I’m answering his call for more information. I hope that you will, too, either by commenting on this post or emailing me a jstafford@techtarget.com.

Gildersleeve works for a company that has a Progress database running on a UNIX server. Hundreds of Windows clients and Web applications are attached to that database and server through Progress Brokers via service file ports.

“I need to provide 365 by 24 by 7 up time,” Gildersleeve said. “With our new web business, East and West coast facilities, and vendors managing our stock and replenishment, we need to be available all of the time.”

He wants to run his database across at least two servers, in a setup like an Oracle Real Application Cluster.

“This would allow me to upgrade the OS (operating system), reboot a server or take a server down for maintenance without affecting the database or the users. So far I have only found solutions that will give me a two-to-five minute downtime between switching from one server to another.”

Gildersleeve has looked a little at server virtualization. He’s evaluating server virtualization options and VMware HA to see if he can cut the downtime to nil. It seems to him, however, that virtualization options only cover one server at a time. He wants 100% uptime across several servers used for database activities.

“What if I need to do an OS update or patch, or what if some critical hardware fails? What I have seen so far is that if I upgrade my Progress app to v10 (Progress OpenEdge), and then move to two Integrity servers running (VMware) High Availibility; if one server fails or if we need to do maintenance on a server, we can manually switch to the second server. But the problem with this is that my users will feel the switch because I will need to bring one server down. They will need to log out and in again to the app, or whatever needs to be done to bring the ready server into production mode.”

Gildersleeve is willing to evaluate Sun Microsystems options, if they are truly viable for running Progress. Microsoft operating systems are out of the question, however.

In his evaluations, Gildersleeve has come up with a lot of questions, and he’s looking for advice from HA experts. Could you provide some advice and share your experiences by commenting on this post or emailing me a jstafford@techtarget.com?