For implementation reasons, a virtual
machine tracks CPU and memory resources slightly differently. CPU resources,
including NUMA, indicate virtualization overhead, shown with
vm.
prefix. Memory resources are
broken out by guest memory, shown with
guest.
prefix, and by overhead
memory, with
ovhd.
prefix. Future
implementations may add additional metrics.
This example shows various CPU and memory
statistics:
= (cumulative) CPU time
the virtual machine could have run, but did not run due to CPU contention. This
metric includes time losses due to hypervisor factors, such as overcommit.
Specific sources of contention vary widely from release to release. See
Comparison to esxtop
for details about calculating CPU contention.
vm.cpu.contention.mem
– (cumulative) CPU time
the virtual machine could have run, but did not run due to memory contention.
This metric includes losses due to swapping. Equivalent to
esxtop %SWPWT
.
vm.numa.local
– (instantaneous) KB of memory
currently local, sum across the VM’s NUMA nodes.
vm.numa.remote
– (instantaneous) KB of memory
currently remote, sum across the VM’s NUMA nodes.
guest.mem.reserved
– (static) KB of memory
reserved for the guest OS. This indicates memory that will never be ballooned
or swapped. Default is 0.
guest.mem.limit
– (static) KB of memory the
guest must operate within. Default –1 means unlimited.
guest.mem.mapped
– (instantaneous) KB of memory
currently mapped into the guest; that is, memory the guest can access with zero
read latency. This metric represents memory use from a guest perspective.
guest.mem.consumed
– (instantaneous) KB of
memory used to provide current mapped memory. This might be lower than mapped
due to ballooning, memory sharing, or future optimizations. This metric
represents memory use from a host perspective. The difference between
guest.mem.mapped
and
guest.mem.consumed
is additional
memory made available due to hypervisor optimizations.
guest.mem.swapped
– (instantaneous) KB of
memory swapped to disk. A fully reserved virtual machine should never see
memory swapped out in steady-state usage. Transient conditions, such as resume
from memory-included snapshot, might show some swap usage.
guest.mem.ballooned
– (instantaneous) KB of
memory deliberately copied on write (COWed) to zero in the guest OS, to reduce
memory usage.
guest.mem.swapIn
– (cumulative) KB of memory
swapped in for the current session.
guest.mem.swapOut
– (cumulative) KB of memory
swapped out for the current session.
ovhd.mem.swapped
– (instantaneous) KB of
overhead memory currently swapped.
ovhd.mem.swapIn
– (cumulative) KB of overhead
memory swapped in for the current session.
ovhd.mem.swapOut
– (cumulative) KB of overhead
memory swapped out for the current session.
Expected values for some of the statistics:
vm.cpu.contention.mem
– usually < 1%, anything greater indicates memory
overcommit.
vm.cpu.contention.cpu
– < 5% of incremental time during undercommit, < 50% of
incremental time at normal levels of overcommit (vSphere is tuned to perform
best when somewhat overcommitted).
When contention is < 5%, performance will be
deterministic but the host is not fully used.
When contention is between 5% and 50%, the host is
becoming fully used (maximum CPU throughput) but individual virtual machines
might see less deterministic performance.
vm.numa.local
– Expected to match
guest.memory.mapped
. Transient
conditions such as NUMA rebalance can cause this to temporarily decrease, then
return to normal as memory is migrated.
vm.numa.remote
– Expected to be approximately
zero in non-overcommitted scenarios.
guest.mem.mapped
– Expected to equal configured
guest memory; might be smaller if virtual machine has yet to access all its
memory.
guest.mem.consumed
– Expected to be
approximately equal to configured guest memory; will be smaller if host memory
is overcommitted.
guest.mem.swapped
– Expected to be zero.
Non-zero indicates non-graceful memory overcommit.
guest.mem.ballooned
– Expected to be zero.
Non-zero indicates graceful memory overcommit.
ovhd.mem.swapped
– Expected to be zero.
Non-zero indicates memory overcommit.
Equations for CPU and memory metrics:
session
uptime
=
vm.cpu.used
+
vm.cpu.contention.cpu
+
vm.cpu.contention.mem
+ CPU
idle time
configured memory size =
guest.mem.mapped
+
guest.mem.swapped
+ (memory not
yet touched)
configured memory size =
vm.numa.local
+
vm.numa.remote
(another formula
for arriving at the same statistic above)
guest.mem.mapped
=
guest.mem.consumed
+
guest.mem.ballooned
+ (other
copy-on-write sources)
Comparison to
esxtop
Individual reasons for lack of vCPU progress are
available to vSphere administrators (using either
esxtop
or the vSphere API) but
are hidden from the guest OS to preserve isolation between the virtual machine
and the configuration of the infrastructure it runs upon. The guest OS sees
only an aggregate metric.
vm.cpu.used
is equivalent to the
esxtop
statistic %USED for a
virtual machine.
vm.cpu.contention.cpu
is equivalent to (%RDY
– %MLMTD) + %MLMTD + %CSTP + %WAIT + (%RUN – %USED)
(%RDY – %MLMTD) represents time the guest OS
could not run due to host CPU overutilization. Note that %RDY includes %MLMTD,
which is why it is subtracted before being added.
%MLMTD represents time the guest OS did not run
due to administrator-configured resource limits. ESXi 6.0 and earlier did not
add %MLMTD to this computation, but this is fixed in ESXi 6.5.
%CSTP represents time the guest OS could not run
due to uneven vCPU progress.
%WAIT represents time the guest OS could not run
due to hypervisor overheads.
(%RUN – %USED) corrects for any frequency
scaling of the host CPU.
vm.cpu.contention.mem
is equivalent to %SWPWT.
See https://communities.vmware.com/docs/DOC-9279
for details about
esxtop
.
Note on nominal CPU
speed and CPU metrics
The
host.cpu.processorMHz
metric
(in the host section) reports a nominal speed, and the virtual machine CPU
metrics are normalized to the
processorMHz
metric. Actual
processor speed might be higher or lower depending on host power management.
A virtual machine can see
vm.cpu.used
exceed wall
clock time due to Turbo Boost, or can see
vm.cpu.used
lag wall clock
time due to power saving modes used in conjunction with idle guests. Actual
processor speed is not available to the guest OS, but is expected to be close
to nominal clock speed when the guest OS is active. See
http://www.vmware.com/files/pdf/techpaper/hpm-perf-vsphere55.pdf for more
information about vSphere host power management.
Normalizing CPU metrics to nominal CPU speed
allows the guest OS to avoid dependence on host power management settings.
Note on
vm.cpu.contention.cpu
Using the Extended Guest Statistics discussed in
this section, you can obtain a contention ratio by comparing contention time to
actual time for a particular time interval. As contention time is reported as a
sum across VCPUs, and wall time is reported for the entire virtual machine, the
wall time must be scaled up by the number of VCPUs to normalize contention to a
0-100% range.
(see
Table 2),
except “stolen time” excludes time the virtual machine did not run due to
configured resource limits. Comparing this value to
esxtop
requires denormalizing
the contention ratio, because
esxtop
reports a sum of
percentages across VCPUs. So:
((%RDY – %MLMTD) + %MLMTD + %CSTP + %WAIT +
(%RUN
–
%USED)) ~=
Contention% * VCPUs
Due to sample aliasing where in-guest time
samples and
esxtop
time samples do not
occur simultaneously, instantaneous
esxtop
values will not match
instantaneous in-guest statistics. Longer time samples or averaging values
collected over time will produce more comparable results.
A contention value of < 5% is normal
“undercommit” operating behavior, representing minor hypervisor overheads. A
contention value > 50% is “excess overcommit” and indicates CPU resource
starvation – the workload would benefit from additional CPUs or migrating
virtual machines to different hosts. A contention value between 5% and 50% is
“normal overcommit” and is more complicated. The goal of this metric is to
allow direct measurement of the performance improvement that can be obtained by
adding CPU resources.
The following figure
illustrates these concepts.
CPU use across
virtual machines
VMware best practices describe the available CPU
capacity of an ESXi host as equal to the number of cores (not including
hyperthreads). A 16 core host with 2.0GHz processors has 16 cores
*
2000 MHz/core = 32000 MHz
available compute capacity. When actual usage is below that calculated
capacity, the hypervisor is considered “under committed” – the hypervisor is
scaling linearly with load applied, and is wasting capacity.
As actual usage exceeds available compute
capacity, the hypervisor begins utilizing hyperthreads for running virtual
machines to keep performance degradation graceful. Maximum aggregate
utilization occurs during this “normal overcommit” (between 5% and 50%
contention) where each virtual machine sees somewhat degraded performance but
overall system throughput still increases. In this “normal overcommit” region,
adding load still improves overall efficiency, though at a declining rate.
Eventually, all hyperthreads are fully used. Efficiency peaks and starts to
degrade; this “excess overcommit” (>50% contention) indicates the workload
would be more efficient if spread across more hosts for better throughput.
One specific scenario deserves special mention:
the “monster VM” that attempts to give a single VM all available compute
capacity. A VM configured to match the number of host cores (not including
hyperthreads) will peak at the capacity of those cores (with < 5%
contention) but at a performance about 20% lower than an equivalent physical
machine utilizing all cores and hyperthreads. A VM configured to match the
number of host threads (2x host cores) will peak at a performance level more
analogous to a physical machine, but will show about 40% contention (the upper
end of “normal overcommit”) running half the cores on hyperthreads. This
contention metric indicates the load would run better on a larger host with
additional cores, so it is technically “overcommitted” even though performance
is better than a hypervisor running at full commit. This behavior is expected
when attempting to run maximally sized virtual machines.