Continuous Availability Considerations
Continuous Availability (CA) separates the
VMware Aria
Operations
cluster into two fault domains and protects the
analytics cluster against the loss of a fault domain.Cluster Management
Clusters consist of a primary node, a
primary replica node, a witness node, and data nodes.
Activating Continuous Availability within
VMware Aria
Operations
is not
a disaster recovery solution. When you activate Continuous
Availability, information is stored (duplicated) in two different analytics nodes
within the cluster but stretched across fault domains. Due to sizing requirements,
continuous availability requires doubling the system’s compute and capacity
requirements.
If either the primary node or primary
replica node is permanently lost, then you must replace the lost node, which will
become the new primary replica node. If it is necessary to have the new primary
replica node as the primary node, then you can take the current primary node offline
and wait until the primary replica node is promoted to the new primary node. Then
bring the former primary node back online and it will be the new primary replica
node.
Fault Domains
Fault domains consist of analytics nodes, separated into two zones.
A fault domain consists of one or more analytics nodes grouped according to their
physical location in the data center. When configured, two fault domains allow
VMware Aria
Operations
to
tolerate failures of an entire physical location and failures from resources
dedicated to a single fault domain.Witness Node
Witness node is a member of the cluster but not part of the analytics nodes.
To activate CA within
VMware Aria
Operations
, deploy the witness node in the cluster. The witness node
does not collect nor store data. The witness node serves as a tiebreaker when a decision must be made regarding
availability of
VMware Aria
Operations
when the network connection between the two fault domains
is lost. Analytics Nodes
Analytics nodes consist of a primary
node, primary replica node, and data nodes.
When you activate continuous
availability, you protect
VMware Aria
Operations
from data loss if an entire fault domain is lost. If node
pairs are lost across fault domains, there may be permanent data loss.Deploy analytics nodes, within each fault
domain, to separate hosts to reduce the chance of data loss if a host fails. You can
use DRS anti-affinity rules to ensure that the
VMware Aria
Operations
nodes remain on separate hosts.Collector Group
In
VMware Aria
Operations
, you can
create a collector group. A collector group is a collection of nodes (Cloud Proxy
, and analytics nodes). You
can assign adapters to a collector group, rather than assigning an adapter to a
single node. A collector group
must contain the same type of nodes. You cannot mix
Cloud Proxy
, and analytics nodes
in a collector group. When activating continuous availability,
collector groups can be created to collect data from adapters within each fault
domain.
Collector groups do not have any
correlation with fault domains. The functionality of a collector group is to collect
data and provide it to the analytics nodes, which then
VMware Aria
Operations
decides how to keep
the data. If the node running the adapter
collection fails, the adapter is automatically moved to another node in the
collector group.
Theoretically, you can install collectors
in any place, provided the networking requirements are being met. However, from a
failover perspective, it is not recommended to put all the collectors within a
single fault domain. If all the collectors are directed to a single fault domain,
VMware Aria
Operations
stops
receiving data if a network outage occurs affecting that fault domain. Assign all normal adapters to collector
groups, and not to individual nodes. Hybrid adapters require a two-way communication
between the adapter and the monitored endpoint.
For more information about adapters, see
Adapter and Management Packs Considerations.