vSphere Availability Design for the Management Domain

The vSphere HA configuration protects the virtual machines of the management components whose operation is critical for the operation of your VMware Cloud Foundation environment. You consider the varying and sometimes significant CPU or memory reservations for the management virtual machines and the requirements of vSAN.
You configure several vSphere HA features to provide high availability for the management components of the SDDC.
vSphere HA Features Configured for the SDDC
vSphere HA Feature
Description
Host failure response
vSphere HA can respond to individual host failures by restarting virtual machines on other hosts within the cluster.
Response for host isolation
If a host becomes isolated, vSphere HA can detect and shut down or restart virtual machines on available hosts.
Admission control policy
Configure how the cluster determines available resources. In a smaller vSphere HA cluster, a larger proportion of the cluster resources are reserved to accommodate ESXi host failures according to the selected admission control policy.
VM and Application Monitoring
If a virtual machine failure occurs, the VM and Application Monitoring service restarts that virtual machine. The service uses VMware Tools to evaluate whether a virtual machine in the cluster is running.
Admission Control Policies in vSphere HA
Policy Name
Description
Host failures the cluster tolerates
vSphere HA ensures that a specified number of ESXi hosts can fail and sufficient resources remain in the cluster to fail over all the virtual machines from those ESXi hosts.
Percentage of cluster resources reserved
vSphere HA reserves a specified percentage of aggregated CPU and memory resources for failover.
Specify Failover Hosts
If an ESXi host fails, vSphere HA attempts to restart its virtual machines on any of the specified failover ESXi hosts. If a restart is not possible, for example, the failover ESXi hosts have insufficient resources or have failed as well, then vSphere HA attempts to restart the virtual machines on other ESXi hosts in the cluster.
Design Decisions on vSphere Availability for the Default Management Cluster
Decision ID
Design Decision
Design Justification
Design Implication
VCF-MGMT-VCS-CLS-005
Use vSphere HA to protect all virtual machines against failures.
vSphere HA supports a robust level of protection for both ESXi host and virtual machine availability.
You must provide sufficient resources on the remaining hosts so that virtual machines can be migrated to those hosts in the event of a host outage.
VCF-MGMT-VCS-CLS-006
Set host isolation response to Power Off and restart VM in vSphere HA.
vSAN requires that the host isolation response be set to Power Off and to restart virtual machines on available ESXi hosts.
If a false positive event occurs, virtual machines are powered off and an ESXi host is declared isolated incorrectly.
VCF-MGMT-VCS-CLS-007
Set the advanced cluster setting
das.usedefaultisolationaddress
to false.
Ensures that vSphere HA uses the manual isolation addresses instead of the default management network gateway address.
You must manually configure this advanced parameter in case of deploying the management cluster in a single availability zone.
Design Decisions on the Admission Control Policy for the Default Cluster in a Management Domain with a Single Availability Zone
Decision ID
Design Decision
Design Justification
Design Implication
VCF-MGMT-VCS-CLS-008
Configure admission control for 1 ESXi host failure and percentage-based failover capacity.
Using the percentage-based reservation works well in situations where virtual machines have varying and sometimes significant CPU or memory reservations.
vSphere automatically calculates the reserved percentage according to the number of ESXi host failures to tolerate and the number of ESXi hosts in the cluster.
In a cluster of 4 ESXi hosts, the resources of only 3 ESXi hosts are available for use.
VCF-MGMT-VCS-CLS-009
Set the isolation address for the cluster to the gateway IP address for the vSAN network.
Allows vSphere HA to validate complete network isolation if a connection failure occurs on an ESXi host.
You must manually configure the isolation address.
Design Decisions on the Admission Control Policy for the Default Management Cluster for Multiple Availability Zones
Decision ID
Design Decision
Design Justification
Design Implication
VCF-MGMT-VCS-CLS-010
Increase admission control percentage to the half of the ESXi hosts in the cluster.
Allocating only half of a stretched cluster ensures that all VMs have enough resources if an availability zone outage occurs.
In a cluster of 8 ESXi hosts, the resources of only 4 ESXi hosts are available for use.
If you add more ESXi hosts to the default management cluster, add them in pairs, one per availability zone.
VCF-MGMT-VCS-CLS-011
Set an additional isolation address to the vSAN network gateway in the second availability zone.
Allows vSphere HA to validate complete network isolation if a connection failure occurs on an ESXi host or between availability zones.
None.
Design Decisions on the VM and Application Monitoring Service for the Management Domain
Decision ID
Design Decision
Design Justification
Design Implication
VCF-MGMT-VCS-CLS-012
Enable VM Monitoring for each cluster.
VM Monitoring provides in-guest protection for most VM workloads. The application or service running on the virtual machine must be capable of restarting successfully after a reboot or the virtual machine restart is not sufficient.
None.
VCF-MGMT-VCS-CLS-013
Set the advanced cluster setting
das.iostatsinterval
to 0 to deactivate monitoring the storage and network I/O activities of the management appliances.
Enables triggering a restart of a management appliance when an OS failure occurs and heartbeats are not received from VMware Tools instead of waiting additionally for the I/O check to complete.
If you want to specifically enable I/O monitoring, then configure the
das.iostatsinterval
advanced setting.