Like many administrators, I have been quite happy with the performance of my ESX environment. However, we recently had an observation that avoided a potentially disastrous issue.
In my environment, I am using an IBM SAN Volume Controller (SVC) for storage with 4 GB/s host bus adapters (HBA). The driver is proprietary to ESX for connectivity. I am currently running version 3.02, but I recently came across some unexpected behavior.
ESX was only using one of the HBAs for the SAN storage. We wanted to determine what would happen if that path was lost. Would ESX would continue operating as expected? So we performed the following tests and came up with these results:
-Dropped connectivity on first HBA / active path rolled to next HBA, port status ‘dead’
-Restored connectivity on first HBA / port went to ‘on’, active path remained on second HBA
-Dropped connectivity on second HBA / lost all connectivity
Yes, all connectivity was lost in the third step. It was better to correct this now before learning this the hard way. My expectation was that the connectivity would use both HBAs at all times and failover as needed. Luckily, this is easily corrected.
There are two ways to address this issue. One is to run a command to instruct ESX to use both HBAs, and the other is to apply an update. The first option would use the following command:
esxcfg-mpath --policy=rr --lun=vmhba1:0:1
This command would be run per LUN per ESX host, but is a very quick and easy way to address the functionality issue immediately and can be done outside of maintenance mode. This behavior is spelled out in VMware KB article 1003270 online. The solution is to install a critical-class patch to the ESX system to address this, as well as a few other issues. The native behavior in ESX 3.5 has this issue corrected with no updates or commands.
One simple way to see if your ESX host is using the different HBA’s is to look at the LUN properties. From the VMware Infrastructure Client, select an ESX host, select the Configuration tab, select Storage, right-click on a LUN, select Properties, click the Manage Paths button, and look at the path listed as Active. If it is always on the first path and that ESX host has a virtual machine running on that LUN, the host may not be using both paths. Below is a figure showing a LUN that is using the second HBA:
You can also run the following command to see who is active at that moment:
The far-right column has the role of active and preferred assigned to a path within a LUN. If the active designation never leaves one path where there is an active virtual machine, you may be at risk of the behavior we observed initially. This makes a case for ESX host patching, as well as ensuring that all redundant components function as expected during installation.