You can trigger a failover of apps from the leader to the follower in a Tanzu for MySQL installation.
You might want to trigger a failover in the following scenarios:
- You want to take the leader VM down to do planned maintenance.
- The performance of the leader VM degrades.
- The leader VM fails unexpectedly.
- The AZ where the leader VM is located goes offline unexpectedly.
You can use the following metrics to determine if you need to trigger a failover:
-
/p.mysql/available
: This metric monitors whether the MySQL server is currently available. For more information, see Server availability. -
/p.mysql/follower/seconds_behind_master
: This metric monitors how far behind the follower is in applying writes from the leader. For more information, see Leader-Follower metrics. -
/p.mysql/follower/seconds_since_leader_heartbeat
: This metric monitors the number of seconds that elapse between the leader heartbeat and the replication of the heartbeat in the follower. For more information, see Leader-Follower metrics.
For information about errands used to trigger failover, see configure-leader-follower, make-leader, and make-read-only.
To trigger a failover:
Retrieve information
To retrieve the information necessary for stopping the leader and promoting the follower:
-
Log in to your deployment by running:
cf login API-URL
When prompted, enter your credentials.
-
Target the org and space where the leader-follower service instance is located by running:
cf target -o DESTINATION-ORG -s DESTINATION-SPACE
-
Find and record the GUID of the service instance. If you don’t know the name of the service instance, you can list the service instances in the space by running
cf services
first.cf service SERVICE-INSTANCE-NAME --guid
Where
SERVICE-INSTANCE-NAME
is the name of the leader-follower service instance.
For example:$ cf service my-lf-instance --guid 82ddc607-710a-404e-b1b8-a7e3ea7ec063
-
SSH into the Tanzu Operations Manager VM. Follow the procedures in Gather credential and IP Address information and SSH into Tanzu Operations Manager.
-
From the Tanzu Operations Manager VM, log in to your BOSH Director with the BOSH CLI. For more information about logging in with the BOSH CLI, see Log in to the BOSH Director.
-
Use the BOSH CLI to run the
inspect
errand. Run:bosh -d service-instance_GUID run-errand inspect
Where
GUID
is the GUID of the leader-follower service instance you recorded.
For example:$ bosh -d service-instance\_82ddc607-710a-404e-b1b8-a7e3ea7ec063 \ run-errand inspect
-
See the output about the leader-follower MySQL VMs and identify the instance marked
Role: leader
.
For example output:Instance mysql/ca0ed8b5-7590-4cde-bba8-7ca2935f2bd0 Exit Code 0 Stdout 2018/04/03 18:08:46 Started executing command: inspect 2018/04/03 18:08:46 IP Address: 10.0.8.11 Role: leader Read Only: false Replication Configured: false Replication Mode: async Has Data: true GTID Executed: 82ddc607-710a-404e-b1b8-a7e3ea7ec063:1-18 2018/04/03 18:08:46 Successfully executed command: inspect Stderr -
Instance mysql/37e4b6bc-2ed6-4bd2-84d1-e59a91f5e7f8 Exit Code 0 Stdout 2018/04/03 18:08:46 Started executing command: inspect 2018/04/03 18:08:46 IP Address: 10.0.8.10 Role: follower Read Only: true Replication Configured: true Replication Mode: async Has Data: true GTID Executed: 82ddc607-710a-404e-b1b8-a7e3ea7ec063:1-18 2018/04/03 18:08:46 Successfully executed command: inspect -
Record the index of the instance marked
Role: leader
. In this example output, the index of the leader VM isca0ed8b5-7590-4cde-bba8-7ca2935f2bd0
. -
Record the index of the other instance, which is the follower VM. In this example output, the index of the follower VM is
37e4b6bc-2ed6-4bd2-84d1-e59a91f5e7f8
. -
If you still have access to the AZ where the leader VM is located, find out if the leader VM is in the AZ you want to take offline by running:
bosh -d service-instance_GUID run-errand instances
For example:
$ bosh -d service-instance\_82ddc607-710a-404e-b1b8-a7e3ea7ec063 \ instances Deployment 'service-instance_f378ec82-61a4-4e66-8ed9-889c7cf5342f'
Instance Process State AZ IPs mysql/ca0ed8b5-7590-4cde-bba8-7ca2935f2bd0 failing us-central1-f 10.0.8.11 mysql/37e4b6bc-2ed6-4bd2-84d1-e59a91f5e7f8 running us-central1-a 10.0.8.10 2 instancesThe leader VM might not display its status as
failing
if you are performing planned maintenance.
Promote the Follower
To stop the leader VM and promote the follower VM to the new leader:
-
Stop any data from being written to the leader VM by setting it to read-only:
bosh -d service-instance_GUID \ run-errand make-read-only \ --instance=mysql/INDEX
Where:
GUID
: This is the GUID of the leader-follower service instance retrieved above.INDEX
: This is the index of the leader VM retrieved above.
For example:
$ bosh -d service-instance\_82ddc607-710a-404e-b1b8-a7e3ea7ec063 \ run-errand make-read-only \ --instance=mysql/ca0ed8b5-7590-4cde-bba8-7ca2935f2bd0
-
If you still have access to the AZ where the leader VM is located, stop the leader VM:
bosh -d service-instance_GUID stop mysql/INDEX
Use the index of the leader VM retrieved above.
For example:$ bosh -d service-instance\_82ddc607-710a-404e-b1b8-a7e3ea7ec063 \ stop mysql/ca0ed8b5-7590-4cde-bba8-7ca2935f2bd0
-
Set the follower VM as writable by running:
bosh -d service-instance_GUID run-errand make-leader --instance=mysql/INDEX
Use the index of the follower VM retrieved above.
For example:
$ bosh -d service-instance\_82dc607-710a-404e-b1b8-a7e3ea7ec063 \ run-errand make-leader \ --instance=mysql/37e4b6bc-2ed6-4bd2-84d1-e59a91f5e7f8
-
If the
run-errand make-leader
command returns an error, re-run it until the follower VM has finished applying the transactions.
At this point, a single instance is working, but leader-follower replication has not yet been restored.
-
To fail your app over to a single instance instead of restoring leader-follower, skip to Unbind and Rebind the App.
-
If you are triggering a failover in response to the AZ of the leader VM going offline, you can fail your app over to a single instance by following the procedure in Unbind and Rebind the App.
-
To restore leader-follower, you must regain access to the AZ where your leader VM is located. Then follow the procedures in:
Clean up former Leader VM (Optional)
If you are triggering a failover in response to a failing leader VM, to clean up the former leader VM:
-
Deactivate resurrection, specifying the same deployment as previously shown, by running:
bosh update-resurrection off
-
Retrieve the CID of the failing former leader VM by running:
bosh -d service-instance_GUID instances \ --details \ --failing \ --column=”VM CID” \ --json
For example:
$ bosh -d service-instance\_82ddc607-710a-404e-b1b8-a7e3ea7ec063 instances \ --details \ --failing \ --column=”VM CID” \ --json
-
Retrieve the disk CID of the failing former leader VM by running:
bosh -d service-instance_GUID instances \ --details \ --failing \ --column=”Disk CIDs” \ --json
For example:
$ bosh -d service-instance\_82ddc607-710a-404e-b1b8-a7e3ea7ec063 instances \ --details \ --failing \ --column=”Disk CIDs” \ --json
-
Delete the failing former leader VM by running:
bosh -d service-instance_GUID delete-vm vm-CID
Where:
GUID
: This is the GUID of the leader-follower service instance retrieved above.CID
: This is the CID of the failing former leader VM retrieved above.
For example:
$ bosh -d service-instance\_82ddc607-710a-404e-b1b8-a7e3ea7ec063 \ delete-vm i-1db9ede6
-
Orphan the disk of the failing former leader VM:
bosh -d service-instance_GUID orphan-disk DISK-CID
Where:
GUID
: This is the GUID of the leader-follower service instance retrieved above.DISK-CID
: This is the disk CID of the failing former leader VM retrieved above.
For example:
$ bosh -d service-instance\_82ddc607-710a-404e-b1b8-a7e3ea7ec063 \ orphan-disk b-1db9ede6
Orphaning a disk rather than deleting it preserves the disk for possible recovery. After performing recovery operations, you can reattach the disk to a VM. BOSH deletes orphaned disks after five days by default.
Configure the new Follower
To start the former leader VM again and configure it as the new follower:
-
Create the former leader VM by running:
bosh -d service-instance_GUID \ recreate \ mysql/INDEX
Where:
GUID
: This is the GUID of the leader-follower service instance retrieved above.INDEX
: This is the index of the former leader VM that you are re-creating.
For example:
$ bosh -d service-instance\_82ddc607-710a-404e-b1b8-a7e3ea7ec063 \ recreate \ mysql/ca0ed8b5-7590-4cde-bba8-7ca2935f2bd01.
-
Set the former leader VM as a follower using the same values as previously shown:
bosh -d service-instance_GUID \ run-errand configure-leader-follower \ --instance=mysql/INDEX
For example:
$ bosh -d service-instance\_82ddc607-710a-404e-b1b8-a7e3ea7ec063 \ run-errand configure-leader-follower \ --instance=mysql/ca0ed8b5-7590-4cde-bba8-7ca2935f2bd0
-
Use the BOSH CLI to run the
inspect
errand, using the same value as previously shown.
If the output displays one instance markedRole: leader
and another instance markedRole: follower
, then leader-follower replication and high availability are resumed. The deployment should be in its original, working state. You can turn resurrection back on if you want to.bosh -d service-instance_GUID \ run-errand inspect
For example:
$ bosh -d service-instance\_82ddc607-710a-404e-b1b8-a7e3ea7ec063 \ run-errand inspect
Unbind and rebind the app
To fail their apps over to the new leader VM, your developers must bind and rebind their apps to the leader-follower service instance:
If you have BOSH DNS enabled in Tanzu Operations Manager, you do not need to unbind and re-bind your app to a leader-follower service instance to failover the app. The operator activates BOSH DNS in BOSH Director > BOSH DNS Config.
If a developer rebinds an app to the Tanzu for MySQL service after unbinding, they must also rebind any existing custom schemas to the app. When you rebind an app, stored code, programs, and triggers break. For more information about binding custom schemas, see Use custom schemas.
To unbind and rebind your app:
-
Unbind the app from the leader-follower service instance by running:
cf unbind-service APP-NAME SERVICE-INSTANCE-NAME
Where:
APP-NAME
: This is the name of the app bound to the leader-follower service instance.SERVICE-INSTANCE-NAME
: This is the name of the leader-follower service instance.
-
Rebind the app to the leader-follower service instance by running:
cf bind-service APP-NAME SERVICE-INSTANCE-NAME
-
Restage the app by running:
cf restage APP-NAME
Content feedback and comments