VMware v4, failure, ESX hosts disconnecting from vCenter and EMC SAN, SCSI Reservation issues also, October 2010

5 pts.
Tags:
HBA
SAN
SAN disconnect
SCSI Reservation
vCenter 4.0
vCenter disconnect
VMware ESX
VMware ESXI 4.0 Performance
We have a multi cluster VMware environment consisting of HP BL680’s, DL580’s and DL585’s that has been very stable for over a year and we have made no recent changes to storage, switches, hosts or HBAs. Recently we experienced two incidents where ESX hosts on different clusters have disconnected from vCenter and from their shared EMC Clariion arrays (three separate frames). EMC said the Cisco fiber switch saw them as disconnected at the ESX host port. The incident did not happen all at once but started with one host disconnecting followed by other hosts over a period of two to three hours. In some cases both HBA paths lost connection to the SAN and in some cases only one HBA disconnected from the SAN. Re-booting the ESX host reestablished connection to vCenter and to the SAN but in some cases specific LUNs were still not accessible. VMware support found SCSI Reservations on multiple hosts and those host all were unable to see the same 3 LUNs. They had us trespass these LUNs after which the hosts could access their data. In a second incident two days later, one ESX host (one not involved in the previous incident) disconnected from vCenter but did not lose connection to the SAN. Within an hour two other hosts from the same cluster also disconnected from vCenter but not from the SAN. Three of two hosts were re-booted and re-connected to vCenter . The third restored itself without re-booting. Again a specific LUN was inaccessible not appearing to the host. The hosts vmkernel logs on the affected hosts were showing SCSI reservations. The LUN was trespassed, after which we could browse the LUN but the VM’s would not start. The LUN was then trespassed back and the VM’s were able to start and access the data. VMware has recommended a firmware upgrade to our HP and Emulex HBAs which they say resolved a similar issue with an environment similar to ours. However they do not know what the condition is that is causing the problem. Our environment has been very stable for over a year and we have made no changes to storage, switches, hosts or HBAs. Looking to hear from anyone with a similar experience who might have a handle on the root cause of this issue.

Software/Hardware used:
VMware ESX 4.0, HP HBA, Emulex HBA, HP BL680, DL580, DL585, EMC Clariion, Cisco San Switch

Answer Wiki

Thanks. We'll let you know when a new response is added.
Send me notifications when members answer or reply to this question.

REGISTER or login:

Forgot Password?
By submitting you agree to receive email from TechTarget and its partners. If you reside outside of the United States, you consent to having your personal data transferred to and processed in the United States. Privacy

Discuss This Question: 1  Reply

 
There was an error processing your information. Please try again later.
Thanks. We'll let you know when a new response is added.
Send me notifications when members answer or reply to this question.

REGISTER or login:

Forgot Password?
By submitting you agree to receive email from TechTarget and its partners. If you reside outside of the United States, you consent to having your personal data transferred to and processed in the United States. Privacy
  • Cengage
    <strong>Ride a Century ...</strong> [...]lance armstrong armband visited the[...]...
    0 pointsBadges:
    report

Forgot Password

No problem! Submit your e-mail address below. We'll send you an e-mail containing your password.

Your password has been sent to:

To follow this tag...

There was an error processing your information. Please try again later.

REGISTER or login:

Forgot Password?
By submitting you agree to receive email from TechTarget and its partners. If you reside outside of the United States, you consent to having your personal data transferred to and processed in the United States. Privacy

Thanks! We'll email you when relevant content is added and updated.

Following