You may hear the term SCSI reservations frequently when dealing with VMware servers that utilize shared storage. SCSI reservations are used to ensure exclusive access to disk-based resources when multiple hosts are accessing the same shared storage resources. In addition to being used by VMware hosts, SCSI reservations are also used by Microsoft Cluster Server.
SCSI reservations are only used for specific operations when metadata changes are made and are necessary to prevent multiple hosts from concurrently writing to the metadata to avoid data corruption. Once the operation completes the reservation is released and other operations can continue. Because of this exclusive lock, it is important to minimize the concurrent number of reservations that are made. When too many reservations are being made at once, you may receive I/O failures because a host is unable to make a reservation to complete an operation because another host has locked the logical unit number (LUN). When a host is unable to make a reservation because of a conflict with another host, it will continue to retry at random intervals until it is successful; however, if too many attempts are made the operation will fail.
Some examples of operations that require metadata updates include:
- Creating or deleting a VMFS datastore
- Expanding a VMFS datastore onto additional extents
- Powering on or off a VM
- Acquiring or releasing a lock on a file
- Creating or deleting a file
- Creating a template
- Deploying a VM from a template
- Creating a new VM
- Migrating a VM with VMotion
- Growing a file (e.g., a Snapshot file or a thin provisioned Virtual Disk)
Having a minimal amount of reservation conflicts is generally unavoidable and will not have a big impact on your hosts and VMs. To avoid having too many conflicts, try to limit the number of operations that can cause reservations and stagger them so too many are not happening simultaneously. All reservation errors are logged to the /var/log/vmkernel log file on each ESX host. To reduce the amount of conflicts:
- Limit the number of snapshots you have running, as snapshots grow in 16MB increments and every time they grow they cause SCSI reservations.
- Only vMotion a single VM per LUN at any one time.
- Only cold migrate a single VM per LUN at any one time.
- Do not power on/off too many VMs simultaneously.
- Limit VM/template creations and deployments to a single VM per LUN at any one time.
- Consider using smaller LUN sizes (<600GB) and do not use extents to extend a VMFS volume