In the aftermath of the infamous bug in the latest release of VMware ESX, VMware CEO Paul Maritz has released a letter that apologizes for the incident and also explains what went wrong and how they are committed to ensure it never happens again.
For customers who were effected by the widespread problem with ESX 3.5 Update 2 released several weeks ago, is VMware’s apology and promise to improve their processes enough? Or is it going to leave some lingering doubt in the minds of some that may inspire them to look at other virtualization products?
The letter provided an explaination of what what happened:
The issue was caused by a piece of code that was mistakenly left enabled for the final release of Update 2. This piece of code was left over from the pre-release versions of Update 2 and was designed to ensure that customers are running on the supported generally available version of Update 2.
And why it happened:
I am sure you’re wondering how this could happen. We failed in two areas:
- Not disabling the code in the final release of Update 2; and
- Not catching it in our quality assurance process.
And finally what they will do to ensure it never happens again:
We are doing everything in our power to make sure this doesn’t happen again. VMware prides itself on the quality and reliability of our products, and this incident has prompted a thorough self-examination of how we create and deliver products to our customers. We have kicked off a comprehensive, in-depth review of our QA and release processes, and will quickly make the needed changes.
Despite it all, VMware still has a great enterprise product that is robust and mature and is still the virtualization software of choice for most Fortune 500 companies. This incident still could have easily been prevented by following processes when preparing a beta build to become a final build. In addition, their QA processes which are usually designed to ensure a quality product also failed to detect that the time bomb code was still present and active.
Will VMware learn from this incident? Absolutely. Sometimes it takes a big event like this to inspire changes and improvements in a company that may have been set in its ways and wasn’t paying attention to details.
One area that many users were critical of was VMware’s communication on the matter. They were initially slow to issue public communications and proactively contact customers to let them know about the issue. The thread in the VMware Technology Network (VMTN) forums that was started on this issue became the rallying point for many of the users who were experiencing problems as a result of the bug. VMware employees did provide some updates to the thread which let users know they were aware of the bug but did not provide much other information until much later in the day. Another breakdown was that VMware’s knowledgebase that had information on the bug and is often the first place users go to when experiencing a problem becamse so overwhelmed by the number of requests that it was unavailable for over 6 hours.
VMware delivered the fix for the problem fairly quickly as it was available roughly 24 hours after the problem was first reported. Many users were hoping to get it quicker then that, but VMware needed time to package and test the fix before releasing it. VMware also did provide good communication later in the day with detailed updates and emails that were sent to customers.
So is VMware’s apology enough? In my mind it is. Yes, it was an unfortunate incident that caused many customers a good deal of grief but the end result is that VMware responded quickly and effectively and this incident will serve as a lesson that they won’t soon forget and will help make their products and processes stronger going forward.
Hewlett-Packard (HP) has expanded to incorporate a new business unit that will focus on desktop virtualization. Client-side virtualization has been a tough sell, primarily because the ROI comes from areas that companies aren’t used to valuing, such as security, technical support and power savings over regular desktops or notebooks. “Generally, these are encompassed in the facilities cost of the buildings,” said Roberto Moctezuma, VP and GM of the new desktop solutions global business unit. “When we present the ROI case to customers they find it very compelling, but it’s not as granular as in the server virtualization world.”
Although no major product announcements will be made for another two weeks, HP shows promise in the client-side virtualization space due to its Remote Graphics Software (RGS), a technology that seeks to address one of the obstacles of virtual desktop adoption to date: the lack of graphics capabilities. RGS enables remote users to access high-end graphic software such as CAD.
Mixed virtual environments
HP hopes to position itself as a one-stop shoppinag vendor for all types of virtualization and data center purchases. “We think we have a unique portfolio in terms of breadth from the client-side and access devices, to notebooks to hardware for the data center coupled with differentiated software and manageability offerings.”
Partnerships with all three major virtualization vendors (VMware, Microsoft and Citrix) have allowed HP to preload their thin client devices with any of the above platforms, which enables a plug-and-play like experience for the end-user.
“When you look at the market a couple of years out, people will be using all types of virtualization platforms and technologies. HP’s strategy is to be the best option that brings them all together,” he said.
For VMware users, HP entering the thin-client virtualization space could mean fewer headaches about hardware and software compatibility in the future. “HP is working very closely with VMware to make the client virtualization paradigm simple to use and deploy. We want to provide a leading experience for our customers, both from an end user perspective and an IT perspective,” Moctezuma said. HP can now provide the thin-client, the virtualization software via its partnerships with virtualization software companies, and the blade servers to run the system.
Security still a major concern
There is a need for businesses to have the ability to keep data secure in an increasingly mobile work force, and the ability to add end-users quickly when, for example, opening up a call center in a new country, Moctezuma said.
Security is a primary concern in health care and financial industries. “You get high-CPU consumption users in the financial trading industry that are running four 24-inch monitors off of one thin-client device, accessing a single blade workstation back at the data center,” he said. Moctezuma also said that other high-end users get better hardware utilization because of worldwide remote employees: now, a blade server can be accessed around the clock from different parts of the world via employees’ thin client devices. HP calls this the “follow the sun” model.
Santa Clara, California-based Sun Microsystems, Inc. announced a handful of multi-year original equipment manufacturer (OEM) agreements with Avanquest Software, Q-layer and Zenith InfoTech Ltd. to allow them to deliver Sun’s xVM VirtualBox virtualization platform.
Sun xVM VirtualBox software is a component of Sun’s broader xVM virtualization and management software portfolio, which includes Sun xVM Ops Center, Sun xVM Server and the Sun Virtual Desktop Infrastructure (VDI) Software. The xVM VirtualBox software is the free, entry-level offering into the Sun xVM platform.
Sun xVM VirtualBox supports whichever operating system and application stack a user chooses, and has a small enough footprint to be an embedded component in OEM equipment.
Since its release in January 2007, Sun xVM VirtualBox has surpassed 5 million downloads, and is the first free hypervisor to support all major host operating systems, including Mac OS X, Linux, Windows, Solaris and OpenSolaris.
The 20 megabyte download installs in less than 5 minutes, and has received positive third-party reviews and awards, and is being used by the Texas Advanced Computing Center, or TACC on part of its 4,000-node supercomputer.
La Garenne-Colombes, France-based Software publisher Avanquest Software will produce and publish Sun xVM VirtualBox bundled with OpenSolaris and sell it in retail outlets in the UK, Germany, Italy, Spain and France. Beginning this fall, Avanquest will provide Mac users with a solution to run the Windows operating system through Sun xVM VirtualBox.Mountain View, Calif.-based Q-Layer, a provider of cloud computing through Virtual Private Data Centers (VPDC), is using Sun xVM VirtualBox to deliver virtualization capabilities for its customers.
Bombay, India-based Zenith InfoTech Ltd., a managed services and business continuity software provider, has built its network attached storage appliance for small and medium-sized businesses using Sun xVM VirtualBox.
Sun xVM VirtualBox is available free of charge under a Personal Use License. OEMs have two options for licensing xVM VirtualBox: open source edition under GPLv2 or under a commercial license.
A bug in the latest versions of both VMware ESX and ESXi (3.5 Update 2) has effected many of VMware’s customers — and VMware is asking its users to wait 36 hours for a patch.
As the date changed to August 12, 2008, customers were finding out that they could no longer start virtual machines on there ESX hosts or vMotion them to other hosts.
A post was made to the VMware Technology Network (VMTN) community about this bug to which many customers responded that they were experiencing the same problem and had spent hours trying to figure out what was wrong. The problem was not immediately obvious to most because the error that was being displayed was that a general system error has occurred, the actual error that could be found by going through the virtual machine log files was that the product had expired. Many users contacted support, who eventually figured out they had a major issue on their hands.
Currently, the only workaround for this is to set the host clock back and to restart virtual machines; however, this workaround is not acceptable for many customers who rely on accurate time for their systems and applications as well as to satisfy compliance regulations. Virtual machines that are already running are not effected by this bug unless they are rebooted or powered off and back on.
The bug appears to have been code that was left in the beta version of ESX to stop working on a specific date after the beta had ended. This is commonly done by software vendors and is known as “time bombing”: software stops working past a certain date and users are forced to use the latest gold version instead of continuing to use the beta version.
VMware has published a knowledgebase article on this issue and promises to release a fix within 36 hours. For most customers this is not enough, having to wait 36 hours is much too long for a problem of this magnitude. They are looking for an immediate fix to the problem so they can apply it to their effected hosts. Additionally there is concern about how the fix will be delivered, presumably it will be released as a new build of ESX which will require ESX hosts to be offline as it is installed and they are re-booted.
Many customers posting to the VMTN thread have expressed anger and frustration at VMware for this. To make matters worse and further frustrate users, VMware’s knowledgebase went offline shortly after the document was published presumably because it could not handle the extraordinary amount of requests.
It is hard to believe a company the size of VMware could allow this to happen. Something like this could not be picked up in beta testing and is not necessarily a bug but negligence on VMware’s part by not removing or disabling this code before it was released as the gold version. Most software companies have strict processes for developing, testing and performing quality assurance before releasing a new build. How something like this could happen is anyone’s guess right now but it appears that either processes do not exist or they were simply not followed.
In the meantime, customers continue to wait for VMware to release a fix for this. Because of the severity and the effect on so many customers there will most likely be some type of fallout at VMware over this. Something needs to be done for VMware to assure customers that they are taking this very seriously and are committed to doing everything possible to ensure that this never happens again. With Hyper-V now a viable alternative, VMware can’t afford major mistakes like this.
I sat down with Oracle Corp.’s VP of Linux engineering, Wim Coekaerts, during the LinuxWorld/Next Generation Data Center conference in San Francisco today to find out about their new virtualization product, Oracle VM Templates, announced August 6.
Oracle VM Templates are basically a time-saving approach to deploying a fully configured Oracle software stack, since each template provides pre-installed and pre-configured images of Oracle software.
Oracle VM Templates can be downloaded for free to install Oracle products onto servers where Oracle VM resides. Following the download, the servers will have a fully installed and configured software environment (based on Oracle’s Linux product) to play with, Coekaerts explained.
“The VM Template includes all of the patches and anything else that we would want our customers to have, and they don’t have to do any of the install work,” Coekaerts said. “The template is good for test and dev, because it allows people to compare us to other products. They download it and can play around with it, and if they are happy with it, they then pay the normal oracle Licence ad support fees for whichever product they are using the template for.”
The VM templates are not replacing the traditional method of installing Oracle software, but is simply an alternative he said.
The first set of Oracle VM Templates are now available for Oracle Database 11g, Oracle Enterprise Manager, Oracle’s Siebel CRM 8, and Oracle Enterprise Linux.
Coekaerts said Oracle’s goal is to come out with a VM Template for existing Oracle products about once a month, and there will be a library of templates users can use to deploy Oracle software quicker than it would traditionally take. Oracle partners will also use the templates to help users deploy their software, he said.
The templates are essentially a way for Oracle to promote its Xen-based hypervisor, Oracle VM, which supports both Oracle and non-Oracle applications. Oracle VM is also available for free download, and users pay for support.
Eric Siebert’s recent post on optimizing the host environment is a very important concern that may frequently be passed aside in the interest of reducing implementation time for virtual environments. In this blog, I would like to pipe in with a few of my own tips related to the host environment. These strategies are applicable to many virtualization platforms, and will transcend products as virtualization advances.
DNS configuration for the hosts
Having a correct DNS environment is important for all systems, not just virtual environments. Pay particular order to the suffix search order, as the first result for queries should be consistent and timely across hosts. Also, consider host entries for fixed systems, with an entry for the host itself, all other hosts, the management system and any other relevant systems with which the host would need to communicate. A specific issue is VMware’s DRS functionality, which can have issues with incorrect DNS configurations.
Time configuration for the hosts
For platforms that are Windows based and members of an Active Directory domain, this concern is somewhat eased. But for Linux systems, you want to have an automated mechanism in place to manage accurate time across hosts. For ESX and VirtualCenter, Eric again has covered this well over on SearchVMware.com with a tip.
Also decide whether you want guest virtual machines to sync time with the host via the driver software (VMware Tools, Guest Additions, etc.) This will relieve issues that go with multiple time zone support as well as separate issues in time synchronization.
Get environment agent notifications right
For virtualization hosts on the server level, all hardware failure notifications should be configured to the fullest extent possible. This can be device alerts (Dell DRAC/HP iLO), SNMP alerts, agent configurations or even blade server management software. With the scope of the virtual environment, maybe even use multiple notification mechanisms.
Single hypervisor per platform
This is more relevant on desktop environments, but it goes without saying that you should not install two products on a single system. Even though it may be tempting to have the functionality of multiple platforms, it may complicate the host environment. Take VMware Server and Sun xVM VirtualBox as an example, they theoretically could exist on same systems because of the VMware Bridge protocol binding and the VirtualBox explicit host adapters able to have their own configuration. This is one of those just-because-you-can-does-not-mean-you-should scenarios.
Host configuration is an area ripe for configuration procedures and policy enforcement to ensure consistent behavior among host systems. The procedural investment can usually help present the virtualization solution with more credibility as well.
Sun has released VirtualBox 1.6.4, and the upgrade process requires some forward planning. Version 1.6.4 is a collection of fixes to the previous release that mostly revolve around shared folders and the VRDP (VirtualBox Remote Desktop Protocol) implementation. Here is what you need to know if you are upgrading:
During the upgrade installation, you are presented with the familiar message about installing a device that has not passed Windows logo testing. These messages are common across virtualization platforms, as these drivers and devices enable the hypervisor to present the virtual machines.
After these messages are accepted, the installation will continue and allow you to access your existing VMs from the previous version that you may have.
The one unfortunate point of the upgrade process is that any host interfaces created on an existing installation of 1.6.2 or earlier will be removed by the upgrade process. Overall, I think VirtualBox’s networking implementation is a little short of both VMware Workstation and VMware Server’s VMware bridge protocol. Before you embark on the upgrade, I recommend you enumerate any host interfaces that you have created. Then, make a quick script in the following fashion that will recreate them with the same names you already have:
VBoxManage createhostif "VM-Bridge1"
VBoxManage createhostif "VM-Bridge2"
VBoxManage createhostif "VM-Bridge3"
Any VMs with a bridged interface will be configured to an invalid network interface after the upgrade to 1.6.4. I have an earlier blog posting about bridged networking on VirtualBox, and the commands and planning points are unchanged from 1.6.2 to 1.6.4.
The VMs will not need to be upgraded directly, but it would not hurt to get the 1.6.4 version of Guest Additions installed to optimize the corrected functionality between these two versions. Once the new version is installed, the systray icon and the
VBoxControl getversion will show the 1.6.4 release.
Version 1.6.4 is still lean, at only 23 MB, it remains a ready to go virtualization platform and is still freely available from the Sun website.
By eliminating wasteful resource use on your host servers, you can make more resources available for additional virtual machines.
Most operating systems today have been developed to run on physical servers in non-virtual environments. Because all the virtual machines are competing for the same resources on the host server, you want to limit the guest operating system so it only consumes resources that it needs to perform whatever function that it has been designated to do.
Microsoft Windows is notorious for wasting server resources in its typical default configuration. Many unnecessary services are loaded that most servers do not need: for example, when’s the last time you needed the Windows Audio, Print Spooler and Wireless Configuration services on your SQL Server? Windows also constantly reads and writes to disk for things like swap and log files and Windows networking tends to be very chatty on a network often generating excessive network traffic.
All of these additional services generate excessive and often unnecessary network, CPU, memory and disk resource usage. It may not be all that much on any one individual server, but add that up across 12 virtual machines on a host and it makes a difference.
Windows Server 2008 takes a step in the right direction with its Server Core installation which strips out many of the unneeded components including the GUI. Many Linux distributions are already optimized to perform specific functions as well. Additionally, there are many virtual appliances available that have very small footprints and make for good alternatives to full-blown operating systems.
Here are some tips for reducing the amount of resources that your servers consume:
- Keep event and audit logging to a minimum
- Disable unnecessary Windows services
- Disable unneeded network protocols
- Disable screen savers and visual effects
- Remove any unneeded applications
- Remove all unneeded hardware from the virtual machine configuration
- If the server was a physicla-to-virtual (P2V) converstion, delete any non-present hardware
- Optimize anti-virus confgurations to exclude specific directories or disable real time scanning
- Disable NTFS last accessed time stamp
- For Linux systems, disable unneeded daemons, services and background tasks and do not run X-Windows if possible.
In the future, operating systems will evolve to become specifically optimized to run on virtual servers. Until then you should take steps to ensure that your guest servers are optimized to run on virtual hosts.
An event like a complete data center power failure is something you never want to experience. Having recently gone through one I thought I would share some lessons learned from it.
This particular data center had a full UPS (uninterruptible power supply) system and backup diesel generator when a routine battery maintenance performed on the UPS shorted some circuits causing power loss to the entire data center. This event made me realize that a little preparedness can go along way in getting servers and virtual machines (VMs) back online after a power failure.
First and foremost, the DNS (Domain Name System) is probably the most important service in your data center. Most servers and workstations use DNS names instead of IP addresses to communicate with each other. Without DNS, servers can’t get to anything by hostname and will effectively be isolated from each other. Most administrators are used to using DNS names, so when DNS is not available they usually do not know the IP addresses of the server and subsequently can’t connect to them. So it is a good idea to have a hard copy of all your servers and their IP addresses somewhere in your data center for you to reference when DNS is not available.
Virtual servers can be even more problematic. If you have all your DNS servers virtualized which cannot be started because of network or shared storage issues, you can run into problems starting other servers and services that rely on DNS. Consider having at least one physical DNS server or having one or two DNS servers running on local storage instead of shared storage.
Another helpful insight: Make sure you know command line procedures for administration on your host servers. You may not be able to connect to your host via a graphic user interface (GUI) until certain systems are up so the command line can be your only way to check the host server health and perform VM operations. Again, it helps to have paper documentation of the host command line utilities and their syntaxes.
Finally you want to make sure you start your servers back up in the proper order due to dependencies that certain servers and applications have. Obviously, with the network unavailable, not much is going to function properly. The storage-area network (SAN) is also critical for your host servers that utilize shared storage for VMs. Windows servers also take a very long time to boot if a DNS server and domain controller are not available when they are starting.
Below is a general order for restarting your servers and applications.
- DNS servers
- DHCP servers
- Database servers
- Application/Web servers
The boy scout motto ‘be prepared’ holds true. A little preparation and planning can go along way to ensuring a smoother recovery.
A product that provides a smooth transition between host and virtual machine helps when selecting a product for virtualization on a workstation. Sun xVM VirtualBox offers a seamless session that can make the transition between your guest and host quite transparent. The seamless window functionality is available on Windows and some Linux guest VMs for VirtualBox. The Guest Additions package is required for both platforms to use the seamless window feature.
To enable the seamless window, press the host key (which is the right CTRL key by default) and the letter ‘L’ together. VirtualBox will present the following information message before engaging the seamless Window functionality:
For Windows host systems, the VM will still reside as a separate window in the taskbar. When you select the VM in the taskbar, the active items are overlaid onto the host. For Windows systems, the guest VM desktop is not shown unless the show desktop command is sent. In the example below, a Windows Server 2008 guest VM is running in a seamless window on the Windows XP host:
This seamless functionality makes it feel less like a VM, and all keyboard and mouse operations are entirely smooth. To exit the seamless window feature, host key and the ‘L’ key together will switch the VM back to a contained window. VirtualBox can also do mixed video modes in the seamless window quite well. For example, if the guest VM is running at a 16-bit color depth, and the host is running at a higher rate the lower rate is mixed in for the VM components well.
The seamless window feature has available in VirtualBox since version 1.5. More information on VirtualBox can be found online at the Sun website.