Recovering a Cluster Node

Restoring a
Automation Orchestrator
node can cause issues with the Kubernetes service.
To recover a problematic node in your
Automation Orchestrator
cluster, you must locate the node, remove it from the cluster, and then add it to the cluster again.
  1. Identify the primary node of your
    Automation Orchestrator
    cluster.
    1. Log in to the
      Automation Orchestrator Appliance
      command line of one of your nodes over SSH as
      root
      .
    2. Find the node with the
      primary
      role by running the
      kubectl -n prelude exec postgres-0
      command.
      kubectl -n prelude exec postgres-0 – chpst -u postgres repmgr cluster show --terse --compact
    3. Retrieve the name of the pod in which the primary node is located.
      In most cases, the name of the pod is
      postgres-0.postgres.prelude.svc.cluster.local
      .
    4. Find the FQDN address of the primary node by running the
      kubectl -n prelude get pods
      command.
      kubectl -n prelude get pods -o wide
    5. Find the database pod with the name you retrieved and get the FQDN address for the corresponding node.
  2. Locate the problematic node by running the
    kubectl -n prelude get node
    command.
    The problematic node has a
    NotReady
    status.
  3. Log in to the
    Automation Orchestrator Appliance
    command line of the primary node over SSH as
    root
    .
  4. Remove the problematic node from the cluster by running the
    vracli cluster remove <NODE-FQDN>
    command.
  5. Log in to the
    Automation Orchestrator Appliance
    command line of the problematic node over SSH as
    root
    .
  6. Add the node to the cluster again by running the
    vracli cluster join <MASTER-DB-NODE-FQDN>
    command.