Virtualization Pro

Jun 23 2009   3:34PM GMT

Killing a frozen VM on a vSphere ESX host

Eric Siebert Eric Siebert Profile: Eric Siebert

Occasionally virtual machines (VMs) get stuck in a zombie state and will not respond to a power-off command using the traditional vSphere client power controls. Rebooting a host will fix this condition — but rebooting is usually not an option. Fortunately, there are a few methods for forcing the VM to shut down without rebooting the host.

I previously documented these methods with VMware Infrastructure 3 (VI3) and wanted to make sure they all worked with vSphere. The methods below are listed in order of usage preference starting with using normal VM commands and ending with a brute force method.

Method 1 – Using the vmware-cmd service console command (the command-line interface equivalent of using the vSphere Client)

  1. Log in to the ESX service console.
  2. The vmware-cmd command uses the configuration file name (.vmx) of the VM to specify the VM to perform an operation on. You can type vmware-cmd -l to get a list of all VMs on the host and the path and name of their configuration file. The path uses the Universally Unique Identifier (UUID) or long name of the data store; alternatively, you can use the friendly name instead. If you do not want to type the path when using the vmware-cmd command you can change to the VM’s directory and run the command without the path.
  3. You can optionally check the power state of the VM by typing vmware-cmd <VM config file path & name> getstate.
  4. To forcibly shutdown a VM type vmware-cmd <VM config file path & name> stop hard.
  5. You can check the state again to see if it worked; if it did the state should now be off.

Method 2 – Using the vm-support command to shut down the VM by first finding the virtual machine ID

…and then using the vm-support command to forcibly terminate it. This method does a lot more then shutting down the VM, as it also produces debugging information that you can use to troubleshoot an unresponsive VM.

  1. Log in to the ESX Service Console.
  2. The vm-support command is a multi-purpose command that is mainly used to troubleshoot host and VM problems. You can use the -X parameter to forcibly shutdown a VM and also produce a file with debugging information. This command will create a .tgz file in the directory that you run it in and cannot be run from a VMFS volume directory (running it in the /tmp directory is recommended). First type vm-support -x to get a list of the virtual machine IDs (VMID) of your running VMs.
  3. To forcibly shut down the VM and generate core dumps and log files, type vm-support -X <VMID>. You will receive prompts asking if you want to take a screenshot of the VM. A screenshot can be useful to see if there are any error messages. You will also be prompted to see if you wish to send an NMI and an ABORT to the VM, which can aid in debugging. You must say yes to the ABORT prompt for the VM to be forcibly stopped. Once the process completes, which can take 10-15 minutes, a .tgz file will be created in the directory in which you ran the command that you can also use for troubleshooting purposes. To avoid filling up your file system when the file is created, switch to the /tmp directory when you run the command.
  4. You can check the state of the VM again either by using the vmware-cmd command or by typing vm-support -x and you should not see the VMID for that VM listed anymore. Be sure and delete the .tgz file that is created when you are done to avoid filling up your host disk.

Method 3 – Using the kill command by first finding the process identifier (PID) of the VM and then using the kill command.

  1. Log in to the ESX service console.
  2. The process status (ps) command in Linux shows the currently running processes on a server and the grep command finds the specified text in the output of the ps command. Type ps auxfww | grep <virtualmachinename> to get the process ID (PID) of the VM. You will have two entries returned, one is from the running of the ps command. The longer entry is the running VM process. The longer entry will end in the config file name of the VM and is the one you want to use; the number in the second column of that entry is the PID of the VM.
  3. The kill command in Linux sends a signal to terminate a process using its ID number. The ‘-9′ parameter forces the process to quit immediately and cannot be ignored like the more graceful ‘-15′ parameter can sometimes be. Type kill -9 <PID> which will forcibly terminate the process for the specified VM.
  4. You can check the state using the vmware-cmd command to see if it worked; if it did, the state should now be off.

All three of these methods work identically on ESX hosts in both VI3 and vSphere. These methods also work for ESXi, but their execution is a bit different. In a future blog post we will cover how to use these methods with ESXi.

3  Comments on this Post

 
There was an error processing your information. Please try again later.
Thanks. We'll let you know when a new response is added.
Send me notifications when other members comment.

REGISTER or login:

Forgot Password?
By submitting you agree to receive email from TechTarget and its partners. If you reside outside of the United States, you consent to having your personal data transferred to and processed in the United States. Privacy
  • rreynol
    Eric, Although kill -9 is an option. VMware support has advised, and I have experienced, that you could possibly still leave remnants of a VM running if you do that. Here is a cleaner way to kill the VM: 1. If the vmware-cmd does not help next up is to kill the master user world id 2. cat /proc/vmware/vm/*/names |grep vmname where vmname is the vm that is hung and find the value for vmid 3. less /proc/vmware/vm/vmid value/cpu/status where vmid value is the number from step above. 4. scroll over to the right until you find the group field that shows vm.#### where the #### numbers after vm. will be the master user world id 5. /usr/lib/vmware/bin/vmkload_app -k 9 #### where #### is the master user world id If successful you will get a WARNING message that a signal 9 is being sent and this will cleanly kill all processes associated with the VM. -Robert Reynolds
    0 pointsBadges:
    report
  • Michigun
    Rreynol when I try your way, I can't find proc/vmware/vm/vmid value/cpu/ [CODE][root@esx41 ~]# ls /proc/vmware/vm/10804/ alloc names[/CODE]
    0 pointsBadges:
    report
  • vanzylw
    Hi Robert, I have that procedure documented also but it seems to only work in VI3 but not in vSphere as there is no /cpu/status directory in the /proc/vmware/vm/ directory. I have it documented a bit different as shown below, if you know how to make it work in vSphere please let us know....thanks o Login to the service console o Get the vmid of the VM you want to kill by typing “vm-support –x” or “cat /proc/vmware/vm/*/names” (ie. 2533) o Get the world ID of the VM by typing “less –S /proc/vmware/vm//cpu/status” (substitute with the # from the step above) Use the right arrow to scroll to the right and see the Group field value (ie. vm.2532, this is the WID of the VM), press Q to exit. A world is the software entity created in the VMkernel that runs the virtual machine. To put it another way, every Virtual Machine Monitor (VMM) has a unique world ID assigned for the duration it is powered on. It is similar to a process ID in any other operating system. o To kill the VM type “/usr/lib/vmware/bin/vmkload_app –k 9 ” (substitute with the # from the step above) o You will see a message “Sending signal '9' to world 2532” If the command fails you will see a message “Failed to forward signal 9 to cartel 2532”, otherwise you will see not see a response.
    0 pointsBadges:
    report

Forgot Password

No problem! Submit your e-mail address below. We'll send you an e-mail containing your password.

Your password has been sent to: