The scheduler is a component of the VMkernel that schedules requests for the virtual CPUs assigned to virtual machines to the physical CPUs of the host server. Whenever a virtual machine (VM) uses its virtual CPU, the VMkernel has to find a free physical CPU (or core) for the VM to use. On a typical host server, the number of virtual CPUs usually outnumbers the number of physical CPUs, so the VMs are all competing to use the limited number of physical CPUs that the host has. The scheduler’s job is to find CPU time for all the VMs that are requesting it and to do it in a balanced way, so performance for any one VM does not suffer. This is not always an easy task, especially when VMs are assigned multiple virtual CPUs (virtual symmetric multiprocessing, or vSMP) as this further complicates the scheduling.
To put scheduling in simple terms, think of the scheduler as an air traffic controller that has to handle the many requests for incoming planes (VMs) to land on the limited amount of available runways (CPUs). It’s a delicate balancing act to make sure that all the planes are landing and that they do not sit in the air too long waiting for an available runway. To further complicate the matter larger planes (vSMP VMs) need special runways to land on which makes the air traffic controller job more difficult. If you’ve ever played the iPhone game Flight Control you would know how difficult this is.
Scheduling CPU time for single CPU VMs is much easier for the scheduler as it only has to find one available physical CPU for the VM to use. As mentioned, multiple CPU VMs are more difficult to schedule as the scheduler must find simultaneous multiple physical CPUs for the VM to use. This is called co-scheduling, which is a technique for scheduling related process to run on different processors concurrently. If a VM is assigned multiple processors, the VMkernel needs to fool the operating system into thinking it has multiple processors; co-scheduling is critical for this to take place.
There have been different methods for co-scheduling implemented in different versions of ESX. ESX 2.x used a strict co-scheduler, so a VM with two vCPUs had to have two physical CPUs available simultaneously for the VM to have CPU time; if two physical CPUs were not available, the VM would have to wait until the scheduler found two free to be able to schedule CPU time for the VM.
Beginning with ESX 3.x, a relaxed co-scheduler was implemented so only vCPUs whose scheduling was falling behind (skewed) were co-scheduled and the others were not. By doing this, scheduling becomes easier and it improves overall processor utilization. With vSphere, VMware further improved the relaxed co-scheduling algorithm so the scheduler has more choices when scheduling vCPUs, which will further improve utilization and performance.
Because scheduling is very important to VM performance, you should avoid using CPU affinity which constrains the scheduler and makes it more difficult to schedule CPU time for VMs. CPU affinity can be configured for individual VMs to force them to only run on specific host physical CPU’s and should not be used unless you have a specific need for it. Additionally, setting CPU shares on VMs can cause the scheduler to give higher priority to those VMs with higher share values and lower priority to those VMs with lower share values.
While the scheduler does its best to evenly schedule CPU time for VMs, it can sometimes fall behind on very busy systems which results in degraded VM performance. How long a VM is waiting for an available CPU is measured in a statistic called Ready Time which indicates the amount of time a VM is waiting for a CPU to become available. This can be measured in the command-line utility esxtop as a percentage (%RDY) or in vCenter Server as a time unit.]]>
The enforcement of these rings is done by the processor (CPU) which uses different operating modes that place restrictions on the operations that can be performed by the process currently running in the CPU. Ring 0 has the highest level privilege and is where the operating system kernel normally runs. Code executing in Ring 0 is referred to as running in kernel mode, which is also known as privileged or supervisor mode. All other code such as applications running on the operating system operate in less privileged rings, typically Ring 3. With non-virtualized systems, the operating system runs in privileged mode in Ring 0 and owns the server hardware, applications run in Ring 3 with less privileges as depicted below.
On virtualized systems the hypervisor or Virtual Machine Monitor (VMM) runs in privileged mode in Ring 0 and the VM’s guest operating system must instead operate in Ring 1 as depicted below.
This can cause problems however because most VM guest operating systems are designed to run in Ring 0. To overcome this, the VMM fools the VM’s guest operating systems into thinking they are running in Ring 0 by trapping privileged instructions and emulating them by the VMM. This emulation causes a slight bit of overhead and is the reason that VM performance can typically only achieve up to 98% of native performance compared to physical servers. To overcome this, newer CPUs like the AMD-V and Intel-VT have features that were specifically designed for virtualization and use a new privilege level called Ring -1 (minus one) for the VMM to reside in as depicted below.
This allows for better performance as the VMM no longer needs to fool the VM guest operating system into thinking that it is running in Ring 0 as it can run in there without conflicting with the VMM which has moved to a different level.
The bottom line: When looking for new hardware for your virtual hosts, be sure and choose servers that have one of these types of CPUs that are optimized for use with virtualization.]]>