Recovering a Cluster Node
Restoring a
Automation Orchestrator
node
can cause issues with the Kubernetes service.To recover a problematic node in your
Automation Orchestrator
cluster, you must locate the node, remove it from the
cluster, and then add it to the cluster again.- Identify the primary node of yourAutomation Orchestratorcluster.
- Log in to theAutomation Orchestrator Appliancecommand line of one of your nodes over SSH asroot.
- Find the node with theprimaryrole by running thekubectl -n prelude exec postgres-0command.kubectl -n prelude exec postgres-0 – chpst -u postgres repmgr cluster show --terse --compact
- Retrieve the name of the pod in which the primary node is located.In most cases, the name of the pod ispostgres-0.postgres.prelude.svc.cluster.local.
- Find the FQDN address of the primary node by running thekubectl -n prelude get podscommand.kubectl -n prelude get pods -o wide
- Find the database pod with the name you retrieved and get the FQDN address for the corresponding node.
- Locate the problematic node by running thekubectl -n prelude get nodecommand.The problematic node has aNotReadystatus.
- Log in to theAutomation Orchestrator Appliancecommand line of the primary node over SSH asroot.
- Remove the problematic node from the cluster by running thevracli cluster remove <NODE-FQDN>command.
- Log in to theAutomation Orchestrator Appliancecommand line of the problematic node over SSH asroot.
- Add the node to the cluster again by running thevracli cluster join <MASTER-DB-NODE-FQDN>command.