Recovering a Cluster Node

Restoring a

Automation Orchestrator

node can cause issues with the Kubernetes service.

To recover a problematic node in your

Automation Orchestrator

cluster, you must locate the node, remove it from the cluster, and then add it to the cluster again.

Identify the primary node of your

Automation Orchestrator

cluster.

Automation Orchestrator Appliance

command line of one of your nodes over SSH as

root

Find the node with the

primary

role by running the

kubectl -n prelude exec postgres-0

command.

kubectl -n prelude exec postgres-0 – chpst -u postgres repmgr cluster show --terse --compact

Retrieve the name of the pod in which the primary node is located.

In most cases, the name of the pod is

postgres-0.postgres.prelude.svc.cluster.local

Find the FQDN address of the primary node by running the

kubectl -n prelude get pods

command.

kubectl -n prelude get pods -o wide

Find the database pod with the name you retrieved and get the FQDN address for the corresponding node.

Locate the problematic node by running the

kubectl -n prelude get node

command.

The problematic node has a

NotReady

status.

Automation Orchestrator Appliance

command line of the primary node over SSH as

root

Remove the problematic node from the cluster by running the

vracli cluster remove <NODE-FQDN>

command.

Automation Orchestrator Appliance

command line of the problematic node over SSH as

root

Add the node to the cluster again by running the

vracli cluster join <MASTER-DB-NODE-FQDN>

command.

VMware Aria Automation 8.16