Learn how to check the state of Operations Manager after a power failure in an on premises vSphere installation.
If you have a procedure at your company for handling power failure scenarios and want to add steps for checking the state of Operations Manager, use this procedure as a template.
Automatic recovery process
When power returns after a failure, vSphere and Operations Manager automatically do the following to recover your environment:
-
vSphere High Availability (HA) recovers VMs.
-
BOSH ensures that the processes on those VMs are healthy, with the exception of the Operations Manager VM and the BOSH VM itself. Operations Manager uses BOSH to deploy and manage its VMs. For more information, see the BOSH documentation.
-
The Diego runtime of VMware Tanzu Platform for Cloud Foundry (Tanzu Platform for Cloud Foundry) recovers apps that were running on the VMs. For more information, see Diego Components and Architecture .
Scenarios that require manual intervention
You might require manual intervention when recovering your environment after a power failure:
- If Tanzu Platform for Cloud Foundry is configured to use a MySQL cluster instead of a single node, the cluster does not recover automatically.
The procedure in this topic includes more detail about addressing these scenarios.
Checklist
Use the checklist in this section to ensure that Operations Manager is in a good state after a power failure.
This checklist assumes that your Operations Manager on vSphere installation is configured for vSphere HA and that you have the BOSH Resurrector activated.
Phase | Component | Action |
---|---|---|
1 | vSphere | Ensure vSphere is running |
2 | Operations Manager | Ensure Operations Manager is running |
3 | BOSH Director | Ensure BOSH Director is running |
4 | BOSH Director | Ensure BOSH Resurrector finished recovering |
5 | Tanzu Platform for Cloud Foundry | Ensure VMs for Tanzu Platform for Cloud Foundry are running. This might include manually recovering the MySQL cluster. |
6 | Tanzu Platform for Cloud Foundry | Ensure apps hosted on Tanzu Platform for Cloud Foundry are running |
7 | Operations Manager Healthwatch | Check the Healthwatch Dashboard |
Phase 1: Ensure vSphere is running
Ensure that vSphere is running and has fully recovered from the power failure. Check your internal vSphere monitoring dashboard.
Phase 2: Ensure Operations Manager is running
To ensure that Operations Manager is running, do the following:
-
Open vCenter and go to the resource pool that hosts your Operations Manager deployment.
-
Select the Related Objects > Virtual Machines.
-
Locate the VM with the name
OpsMan-VERSION
, for exampleOpsMan-2.6
. -
Review the State and Status columns for the Operations Manager VM. If Operations Manager is running, the columns show Powered On and Normal. If the columns do not show Powered On and Normal, restart the VM.
Phase 3: Ensure BOSH Director is running
To ensure that BOSH Director is running, do the following:
-
In a browser, go to Operations Manager UI and select the BOSH Director for vSphere tile.
If you do not know the URL of the Operations Manager VM, you can use the IP address that you obtain from vCenter.
-
Select Status.
-
In the BOSH Director row, locate and record the CID. The CID is the cloud ID and corresponds to the VM name in vSphere.
-
Go to the vCenter resource pool or cluster that hosts your Operations Manager deployment.
-
Select Related Objects > Virtual Machines.
-
Locate the VM with the name that corresponds to the CID value that you copied.
-
Review the State and Status columns for the VM. If the State column does not show Powered On, restart the VM.
-
If the State column shows that the VM is Powered On but the Status column does not does not show Normal, try the following:
-
SSH into the BOSH Director VM using the instructions in SSH into the BOSH Director VM.
-
Run the following command to see that all processes are running:
monit summary
-
If the
uaa
process is not running, run the following command:monit restart UAA
-
Phase 4: Ensure BOSH Resurrector finished recovering
If activated, the BOSH Resurrector recreates any VMs in a problematic state after being recovered by vSphere HA.
To ensure BOSH Resurrector finished recovering, do the following:
-
Log in to the Operations Manager VM with SSH using the instructions in Log in to the Operations Manager VM with SSH.
-
Authenticate with the BOSH Director VM using the instructions in Authenticate with the BOSH Director VM.
-
Run the following command to see if there is any currently running or queued Resurrector activity:
bosh tasks --all -d ''
Review the task description for
scan
andfix
. If no task are running, the BOSH Director has probably finished recovering. Runbosh tasks --recent --all -d ''
to view finished tasks.
Phase 5: Ensure the VMs for Tanzu Platform for Cloud Foundry are running
You can also apply the steps in this section to any Operations Manager services. To further ensure the health of Operations Manager services, use the Operations Manager Healthwatch dashboard and the documentation for each service.
To ensure that the VMs for Tanzu Platform for Cloud Foundry are running, do the following:
-
Run the following command to confirm that VMs are running:
bosh vms
BOSH lists VMs by deployment. The deployment with the
cf-
prefix is the Tanzu Platform for Cloud Foundry deployment. -
If the
mysql
VM is not running, it is likely because it is a cluster and not a single node. Clusters require manual intervention after an outage. For instructions to confirm and recover the cluster, see Manually Recover MySQL (Clusters Only). -
If any other VMs are not running, run the following command:
bosh cck -d DEPLOYMENT
This command scans for problems and provides options for recovering VMs. For more information, see IaaS Reconciliation in the BOSH documentation.
-
If you cannot get all VMs running, contact Broadcom Support for assistance. Provide the following information:
-
You have started this checklist to recover from a power failure on vSphere.
- A list of failing VMs.
- Your Operations Manager version.
Manually recover MySQL (clusters only)
To manually recover MySQL, do the following:
-
In a browser, go to Operations Manager UI and select the VMware Tanzu Platform for Cloud Foundry tile.
-
Select the Resource Config pane.
-
Review the INSTANCES column of the MySQL Server job. If the number of instances is greater than
1
, manually recover MySQL by following Recovering From MySQL Cluster Downtime.
Phase 7: Ensure apps hosted on Tanzu Platform for Cloud Foundry are running
To ensure apps hosted on Tanzu Platform for Cloud Foundry are running, do the following:
-
Check the status of an app that you run on Operations Manager. Run any health checks that the app has or visit the URL of the app to verify that it is working.
-
Push an app to Operations Manager.
Phase 8: Check the Healthwatch dashboard
You can use Operations Manager Healthwatch to further assess the state of Operations Manager. For more information, see Using Operations Manager Healthwatch.
Content feedback and comments