Virtualization Pro

Mar 13 2009   8:02PM GMT

Fixing VMware “no swap file” VM power-on failures

Eric Siebert Eric Siebert Profile: Eric Siebert

Recently I experienced a VMware HA event in my environment which caused the VMs on the affected hosts to be restarted on other servers. While most of the VMs started OK, there were a few that did not. When I manually tried to start them I received the error “Failed to power on VM – No swap file” and the VM would fail to start. What happened is that several VMs were in a zombie-like state, as they were not shut down gracefully. Even though their statuses were displayed as shutdown in the VMware Infrastructure Client (VI Client), there was still a process running on an ESX host that prevented it from being started.

In effect, while the VM’s OS was not running it will still in a running state on an ESX host and had a .vswp file already out there that could not be deleted. As a result, when another host tried to start it the .vswp file could not be created because the other host had a lock on it.

To resolve this situation I had to find out the host that still had a running process for the VM and forcibly terminate the process. To do so I had to log in to the service console of each host and run the following command: ps auxfww | grep VM name. This command returns a list of running process that contain the name of the VM.

When you run the ps command with the VM name listed you will always have one result regardless of if the VM is actually running on the host. This is because the command itself shows up in the result list as the VM name is being used in the command when it is run. However, if the VM is actually running on the host you will receive two results instead of one. The second result will be much longer as it contains several lines of text and will contain the path to the .vmx file of the VM. This second result also contains the process ID (pid) of the VM which can be used to forcibly terminate it. The pid of the VM is located in the second column of the results right after the username (typically root). As you can see in the below example, the first result with a pid of 25914 is the command itself and the second result with a pid of 23896 is the running VM.

[root@esx1 root]# ps auxfww | grep win2003-1

root     25914  0.0  0.2  3688  676 pts/0    S    13:17   0:00          \_ grep win2003-1

root     23896 0.0  0.2  2008  864 ?        S<   Feb13   4:12 /usr/lib/vmware/bin/vmkload_app /usr/lib/vmware/bin/vmware-vmx -# name=VMware ESX Server;version=3.5.0;licensename=VMware ESX Server;licenseversion=2.0 build-123630; -@ pipe=/tmp/vmhsdaemon-0/vmxd0af4bb011822fc5; /vmfs/volumes/442d541b-cb5a815d-6083-0017a4a9c074/ win2003-1/ win2003-1.vmx

Now that we know the pid of the VM (23896), to forcibly terminate it we type kill -9 23896. You can verify that the VM process has been terminated by running the ps command again. Only one result should be returned. Now that the VM has been stopped you can power it on using the VI Client and you should have no problems this time.

1  Comment on this Post

There was an error processing your information. Please try again later.
Thanks. We'll let you know when a new response is added.
Send me notifications when other members comment.
  • Texmansru47
    There is also several other ways, since the ability to use 'ps auxfww' may eventually go away. That is to use something like the following which will actually suspend the VM instead of killing it, which may be better than a straight kill. VMID=`/usr/bin/vm-support -x | grep VMName|awk '{print $1}'|awk -F= '{print $2}` vm-support -Z $VMID or if you want to kill with out generating files: VMID=`/usr/bin/vm-support -x | grep VMName|awk '{print $1}'|awk -F= '{print $2}` KVMID=`cat /proc/vmware/vm/$VMID/cpu/status |awk '{print $1}'|grep -v group|awk -F. '{print $2}'` /usr/lib/vmware/bin/vmkload_app -k 9 $KVMID I am sure there are also some other ways, perhaps using the RCLI?
    25 pointsBadges:

Forgot Password

No problem! Submit your e-mail address below. We'll send you an e-mail containing your password.

Your password has been sent to:

Share this item with your network: