Lerarn about the basic instructions for troubleshooting on-demand VMware SQL with MySQL for Tanzu Application Service.
For information about temporary VMware Tanzu for MySQL service interruptions, see Service interruptions.
Troubleshoot errors
This section provides information on how to troubleshoot specific errors or error messages.
Common services errors
The following errors occur in multiple services:
- Failed installation
- Cannot create or delete service instances
- Broker request timeouts
- Instance does not exist
- Cannot bind to or unbind from service instances
- Cannot connect to a service instance
- Upgrade all service instances errand fails
- Missing logs and metrics
- MySQL Load is high with large number of CredHub encryption keys
Failed Installation |
|
---|---|
Symptom | VMware SQL with MySQL for TAS fails to install. |
Cause | Reasons for a failed installation include:
|
Solution | To troubleshoot:
|
Cannot Create or Delete Service Instances |
|
---|---|
Symptom | If developers report errors such as:
Instance provisioning failed: There was a problem completing your request. Contact your operations team providing the following information: service: redis-acceptance, service-instance-guid: ae9e232c-0bd5-4684-af27-1b08b0c70089, broker-request-id: 63da3a35-24aa-4183-aec6-db8294506bac, task-id: 442, operation: create |
Cause | Reasons include:
|
Solution | To troubleshoot:
|
Broker Request Timeouts | |
---|---|
Symptom | If developers report errors such as:
Server error, status code: 504, error code: 10001, message: The request to the service broker timed out: https://BROKER-URL/v2/service_instances/e34046d3-2379-40d0-a318-d54fc7a5b13f/service_bindings/aa635a3b-ef6d-41c3-a23f-55752f3f651b |
Cause | Cloud Foundry might not be connected to the service broker, or there might be a large number of queued tasks. |
Solution | To troubleshoot:
|
Instance Does Not Exist |
|
---|---|
Symptom | If developers report errors such as:
Server error, status code: 502, error code: 10001, message: Service broker error: instance does not exist` |
Cause | The instance might have been deleted. |
Solution | To troubleshoot:
If the BOSH deployment is not found, it has been deleted from BOSH. Contact Support for further assistance. |
Cannot Bind to or Unbind from Service Instances |
|
---|---|
Symptom | If developers report errors such as:
Server error, status code: 502, error code: 10001, message: Service broker error: There was a problem completing your request. Please contact your operations team providing the following information: service: example-service, service-instance-guid: 8d69de6c-88c6-4283-b8bc-1c46103714e2, broker-request-id: 15f4f87e-200a-4b1a-b76c-1c4b6597c2e1, operation: bind |
Cause | This might be due to authentication or network errors. |
Solution | To find out the exact issue with the binding process:
|
Cannot Connect to a Service Instance |
|
---|---|
Symptom | Developers report that their app cannot use service instances that they have successfully created and bound. |
Cause | The error might originate from the service or be network related. |
Solution | To solve this issue, ask the user to send application logs that show the connection error.
If the error originates from the service, then follow VMware SQL with MySQL for TAS-specific instructions.
If the issue appears to be network-related, then:
|
Service instances can also become temporarily inaccessible during upgrades and VM or network failures. See Service interruptions for more information.
Upgrade All Service Instances Errand Fails |
|
---|---|
Symptom | The upgrade-all-service-instances errand fails. |
Cause | There might be a problem with a particular instance. |
Solution | To troubleshoot:
|
Missing Logs and Metrics |
|
---|---|
Symptom | No logs are being emitted by the on-demand broker. |
Cause | Syslog might not be configured correctly, or you might have network access issues. |
Solution | To troubleshoot:
|
MySQL Load is High with Large Number of CredHub Encryption Keys |
||
---|---|---|
Symptom | MySQL load is high | Slow CredHub queries |
Cause | Large number of CredHub encryption keys | |
Solution | To troubleshoot:
|
MySQL Load is High with Large Number of CredHub Encryption Keys |
|
---|---|
Symptom |
|
Cause | Large number of CredHub encryption keys |
Solution | To troubleshoot:
|
Leader-Follower Service Instance Errors
This section provides solutions for the following errands:
- Unable to determine leader and follower
- Both leader and follower instances are writable
- Both leader and follower instances are read-only
Unable to Determine Leader and Follower |
|
---|---|
Symptom | This problem happens when the configure-leader-follower
errand fails because it cannot determine the VM roles. The configure-leader-follower errand exits with 1
and the errand logs contain the following:
$ Unable to determine leader and follower based on transaction history. |
Cause | Something has happened to the instances, such as a failure or manual intervention. As a result, there is not enough information available to determine the correct state and topology without operator intervention to resolve the issue. |
Solution | Use the inspect errand to determine which instance can be the leader. Then, using the
orchestration
errands and backup/restore, you can put the service instance into a safe topology, and then rerun the
configure-leader-follower errand. This is shown in the following example.This example shows one outcome that the inspect errand can return:
|
Both Leader and Follower Instances Are Writable |
|
---|---|
Symptom | This problem happens when the configure-leader-follower errand fails because both VMs are writable and the VMs might hold differing data. The configure–leader-follower errand exits with 1
and the errand logs contain the following:
$ Both mysql instances are writable. Please ensure no divergent data and set one instance to read-only mode. |
Cause | VMware SQL with MySQL for TAS tries to ensure that there is only one writable instance of the
leader-follower pair at any given time. However, in certain situations, such as
network partitions, or manual intervention outside of the provided bosh
errands, it is possible for both instances to be writable. The service instances remain in this state until an operator resolves the issue to ensure that the correct instance is promoted and reduce the potential for data divergence. |
Solution |
|
Both Leader and Follower Instances Are Read-Only |
|
---|---|
Symptom | Developers report that apps cannot write to the database. In a leader-follower topology, the leader VM is writable and the follower VM is read-only. However, if both VMs are read-only, apps cannot write to the database. |
Cause | This problem happens if the leader VM fails and the BOSH Resurrector is activated. When the leader is resurrected, it is set as read-only. |
Solution |
|
Inoperable app and database errors
This section provides a solution for the following errors:
Persistent Disk is Full |
|
---|---|
Symptom | Developers report that read, write, and cf CLI operations do not work.
Developers cannot upgrade to a larger VMware SQL with MySQL for TAS service plan to free up disk space. If your persistent disk is full, apps become inoperable. In this state, read, write, and Cloud Foundry Command-Line Interface (cf CLI) operations do not work. |
Cause | This problem happens if your persistent disk is full.
When you use the BOSH CLI to target your deployment, you see that instances are at 100% persistent disk usage. Available disk space can be increased by deleting log files. After deleting logs, you can then upgrade to a larger VMware SQL with MySQL for TAS service plan. You can also turn off binary logging before developers do large data uploads or if their databases have a high transaction volume. |
Solution |
To resolve this issue, do one of the following:
|
Cannot Access Database Table |
|
---|---|
Symptom | When you query an existing table, you see an error similar to
the following:
ERROR 1146 (42S02): Table 'mysql.foobar' doesn't exist |
Cause | This error occurs if you created an uppercase table name and then activated lowercase table names. You activate lowercase table names either by:
|
Solution | To resolve this issue:
|
Highly available cluster errors
This section provides solutions for the following errands:
- Unresponsive Node in a Highly Available Cluster
- Many Replication Errors in Logs for Highly Available Clusters
Unresponsive Node in a Highly Available Cluster |
|
---|---|
Symptom |
A client connected to a VMware SQL with MySQL for TAS cluster node reports the following error:
WSREP has not yet prepared this node for application useSome clients might instead return the following: unknown error |
Cause | If the client is connected to a VMware SQL with MySQL for TAS cluster node and that node loses connection to the rest of the cluster, the node stops accepting writes. If the connection to this node is made through the proxy, the proxy automatically re-routes further connections to a different node. |
Solution | A node can become unresponsive for a number of reasons. For solutions, see the following:
|
Many Replication Errors in Logs for Highly Available Clusters |
|||||
---|---|---|---|---|---|
Symptom | You see many replication errors in the MySQL logs, like the following:
160318 9:25:16 [Warning] WSREP: RBR event 1 Query apply warning: 1, 16992456 160318 9:25:16 [Warning] WSREP: Ignoring error for TO isolated action: source: abcd1234-abcd-1234-abcd-1234abcd1234 version: 3 local: 0 state: APPLYING flags: 65 conn_id: 246804 trx_id: -1 seqnos (l: 865022, g: 16992456, s: 16992455, d: 16992455, ts: 2530660989030983) 160318 9:25:16 [ERROR] Slave SQL: Error 'Duplicate column name 'number'' on query. Default database: 'cf_0123456_1234_abcd_1234_abcd1234abcd'. Query: 'ALTER TABLE ...' |
||||
Cause | This problem happens when there are errors in SQL statements. | ||||
Solution | For solutions for the replication errors in MySQL log files, see the following table:
ALTER TABLE
or persistent disk or memory issues, you can ignore the replication errors.
|
Failed backups
If an automated backup or a backup initiated from the ApplicationDataBackupRestore (adbr) plug-in fails, verify that the 2345 port from the TAS for VMs to the ODB component is open.
Automated Backups or adbr Plug-in Backups Fail |
|
---|---|
Symptom | The following are true:
|
Cause | Port 2345 that allows communication between the TAS for VMs and ODB components is closed. |
Solution | Open port 2345 from the TAS for VMs component to the ODB component. See Required networking rules for VMware SQL with MySQL for TAS in On-Demand Networking. |
Troubleshoot components
This section provides guidance on checking for and fixing issues in on-demand service components.
BOSH problems
Large BOSH queue
On-demand service brokers add tasks to the BOSH request queue, which can back up
and cause delay under heavy loads.
An app developer who requests a new VMware SQL with MySQL for TAS instance sees
create in progress
in the Cloud Foundry Command Line Interface (cf CLI) until
BOSH processes the queued request.
Tanzu Operartions Manager currently deploys two BOSH workers to process its queue. Users of future versions of Tanzu Operations Manager can configure the number of BOSH workers.
Configuration
Service instances in failing state
The VM or Disk type that you configured in the plan page of the tile in Tanzu Operations Manager might not be large enough for the Tanzu SQL for VMs service instance to start. See tile-specific guidance on resource requirements.
Authentication
UAA changes
If you have rotated any UAA user credentials then you might see authentication issues in the service broker logs.
To resolve this, redeploy the Tanzu SQL for VMs tile in Tanzu Operations Manager. This provides the broker with the latest configuration.
You must ensure that any changes to UAA
credentials are reflected in the Tanzu Operations Manager credentials
tab of the VMware Tanzu Application Service for VMs tile.
Networking
Common issues with networking include:
Issue | Solution |
---|---|
Latency when connecting to the VMware SQL with MySQL for TAS service instance to create or delete a binding. | Try again or improve network performance. |
Firewall rules are blocking connections from the Tanzu SQL for VMs service broker to the service instance. | Open the Tanzu SQL for VMs tile in Tanzu Operations Manager and check the two networks configured in the Networks pane. Ensure that these networks allow access to each other. |
Firewall rules are blocking connections from the service network to the BOSH director network. | Ensure that service instances can access the Director so that the BOSH agents can report in. |
Apps cannot access the service network. | Configure Cloud Foundry application security groups to allow runtime access to the service network. |
Problems accessing BOSH’s UAA or the BOSH director. | Follow network troubleshooting and check that the BOSH director is online |
Validate service broker connectivity to service instances
To validate connectivity, do the following:
-
View the BOSH deployment name for your service broker by running:
bosh deployments
-
SSH into the Tanzu SQL for VMs service broker by running:
bosh -d DEPLOYMENT-NAME ssh
-
If no BOSH
task-id
appears in the error message, look in the broker log using thebroker-request-id
from the task.
Validate app access to service instance
Use cf ssh
to access to the app container, then try connecting to
the VMware SQL with MySQL for TAS service instance using the binding included in the
VCAP_SERVICES
environment variable.
Quotas
Plan quota issues
If developers report errors such as:
Message: Service broker error: The quota for this service plan has been exceeded.
Please contact your Operator for help.
- Check your current plan quota.
- Increase the plan quota.
- Log in to Tanzu Operations Manager.
- Reconfigure the quota on the plan page.
- Deploy the tile.
- Find who is using the plan quota and take the appropriate action.
Global quota issues
If developers report errors such as:
Message: Service broker error: The quota for this service has been exceeded.
Please contact your Operator for help.
- Check your current global quota.
- Increase the global quota.
- Log in to Tanzu Operations Manager.
- Reconfigure the quota on the on-demand settings page.
- Deploy the tile.
- Find out who is using the quota and take the appropriate action.
Failing jobs and unhealthy instances
To determine whether there is an issue with the VMware SQL with MySQL for TAS deployment:
-
Inspect the VMs by running:
bosh -d service-instance_GUID vms --vitals
-
For additional information, run:
bosh -d service-instance_GUID instances --ps --vitals
If the VM is failing, follow the service-specific information. Any unadvised corrective actions (such as running BOSH restart
on a VM) can cause issues in the service instance.
A failing process or failing VM might come back automatically after a temporary service outage. See VM process failure and VM failure.
AZ or region failure
Failures at the IaaS level, such as Availability Zone (AZ) or region failures, can interrupt service and require manual restoration. See AZ failure and Region failure.
Techniques for troubleshooting
Instructions on interacting with the on-demand service broker and on-demand service instance BOSH deployments, and on performing general maintenance and housekeeping tasks
Parse a Cloud Foundry error message
Failed operations (create, update, bind, unbind, delete) result in an error message.
You can retrieve the error message later by running the cf CLI command cf service INSTANCE-NAME
.
$ cf service myservice
Service instance: myservice
Service: super-db
Bound apps:
Tags:
Plan: dedicated-vm
Description: Dedicated Instance
Documentation url:
Dashboard:
Last Operation
Status: create failed
Message: Instance provisioning failed: There was a problem completing your request.
Please contact your operations team providing the following information:
service: redis-acceptance,
service-instance-guid: ae9e232c-0bd5-4684-af27-1b08b0c70089,
broker-request-id: 63da3a35-24aa-4183-aec6-db8294506bac,
task-id: 442,
operation: create
Started: 2017-03-13T10:16:55Z
Updated: 2017-03-13T10:17:58Z
Use the information in the Message
field to debug further.
Provide this information to Support when filing a ticket.
The task-id
field maps to the BOSH task ID.
For more information on a failed BOSH task, use the bosh task TASK-ID
.
The broker-request-guid
maps to the portion of the On-Demand Broker log
containing the failed step.
Access the broker log through your syslog aggregator, or access BOSH logs for
the broker by typing bosh logs broker 0
.
If you have more than one broker instance, repeat this process for each instance.
Access broker and instance logs and VMs
Before following these procedures, log in to the cf CLI and the BOSH CLI.
Access Broker Logs and VMs
You can access logs using Tanzu Operations Manager by clicking on the Logs tab in the tile and downloading the broker logs.
To access logs using the BOSH CLI, do the following:
-
Identify the on-demand broker (ODB) deployment by running the following command:
bosh deployments
-
View VMs in the deployment by running the following command:
bosh -d DEPLOYMENT-NAME instances
-
SSH onto the VM by running the following command:
bosh -d DEPLOYMENT-NAME ssh
-
Download the broker logs by running the following command:
bosh -d DEPLOYMENT-NAME logs
The archive generated by BOSH includes the following logs:
Log Name | Description |
---|---|
broker.stdout.log | Requests to the on-demand broker and the actions the broker performs while orchestrating the request (e.g. generating a manifest and calling BOSH). Start here when troubleshooting. |
bpm.log | Control script logs for starting and stopping the on-demand broker. |
post-start.stderr.log | Errors that occur during post-start verification. |
post-start.stdout.log | Post-start verification. |
drain.stderr.log | Errors that occur while running the drain script. |
Access service instance logs and VMs
-
To target an individual service instance deployment, retrieve the GUID of your service instance with the following cf CLI command:
cf service MY-SERVICE --guid
-
To view VMs in the deployment, run the following command:
bosh -d service-instance_GUID instances
-
To SSH into a VM, run the following command:
bosh -d service-instance_GUID ssh
-
To download the instance logs, run the following command:
bosh -d service-instance_GUID logs
Run service broker errands to manage brokers and instances
From the BOSH CLI, you can run service broker errands that manage the service brokers and perform mass operations on the service instances that the brokers created. These service broker errands include:
-
register-broker
registers a broker with the Cloud Controller and lists it in the Marketplace. -
deregister-broker
deregisters a broker with the Cloud Controller and removes it from the Marketplace. -
upgrade-all-service-instances
upgrades existing instances of a service to its latest installed version. -
delete-all-service-instances
deletes all instances of service. -
orphan-deployments
detects “orphan” instances that are running on BOSH but not registered with the Cloud Controller.
To run an errand, run the following command:
bosh -d DEPLOYMENT-NAME run-errand ERRAND-NAME
For example:
bosh -d my-deployment run-errand deregister-broker
Register broker
The register-broker
errand does the following:
- Registers the service broker with Cloud Controller.
- Activates service access for any plans that are activated on the tile.
- Deactivates service access for any plans that are deactivated on the tile.
- Does nothing for any plans that are set to manual on the tile.
You can run this errand whenever the broker is redeployed with new catalog metadata to update the Marketplace.
Plans with deactivated service access are only visible to admin Cloud Foundry users. Non-admin Cloud Foundry users, including Org Managers and Space Managers, cannot see these plans.
Deregister broker
This errand deregisters a broker from Cloud Foundry.
The errand does the following:
- Deletes the service broker from Cloud Controller
- Fails if there are any service instances, with or without bindings
Use the Delete All Service Instances errand to delete any existing service instances.
To run the errand, run the following command:
bosh -d DEPLOYMENT-NAME run-errand deregister-broker
Upgrade all service instances
The upgrade-all-service-instances
errand does the following:
- Collects all of the service instances that the on-demand broker has registered.
- Issues an upgrade command and deploys the a new manifest to the on-demand broker for each service instance.
- Adds to a retry list any instances that have ongoing BOSH tasks at the time of upgrade.
- Retries any instances in the retry list until all instances are upgraded.
When you make changes to the plan configuration, the errand upgrades all the VMware SQL with MySQL for TAS service instances to the latest version of the plan.
If any instance fails to upgrade, the errand fails immediately. This prevents systemic problems from spreading to the rest of your service instances.
Delete all service instances
This errand uses the Cloud Controller API to delete all instances of your broker’s service offering in every Cloud Foundry org and space. It only deletes instances the Cloud Controller knows about. It does not delete orphan BOSH deployments.
Orphan BOSH deployments do not correspond to a known service instance.
While rare, orphan deployments can occur. Use the orphan-deployments
errand to identify them.
The delete-all-service-instances
errand does the following:
- Unbinds all apps from the service instances.
-
Deletes all service instances sequentially. Each service instance deletion includes:
- Running any pre-delete errands
- Deleting the BOSH deployment of the service instance
- Removing any ODB-managed secrets from BOSH CredHub
- Checking for instance deletion failure, which results in the errand failing immediately
- Determines whether any instances have been created while the errand was running. If new instances are detected, the errand returns an error. In this case, VMware recommends running the errand again.
Use extreme caution when running this errand. You can only use it when you want to totally destroy all of the on-demand service instances in an environment.
To run the errand, run the following command:
bosh -d service-instance_GUID delete-deployment
Detect orphaned service instances
A service instance is defined as “orphaned” when the BOSH deployment for the instance is still running, but the service is no longer registered in Cloud Foundry.
The orphan-deployments
errand collates a list of service deployments that have
no matching service instances in Cloud Foundry and return the list to the operator.
It is then up to the operator to remove the orphaned BOSH deployments.
To run the errand, run the following command:
bosh -d DEPLOYMENT-NAME run-errand orphan-deployments
If orphan deployments exist—The errand script does the following:
- Exit with exit code 10
- Output a list of deployment names under a
[stdout]
header - Provide a detailed error message under a
[stderr]
header
For example:
[stdout] [{"deployment\_name":"service-instance\_80e3c5a7-80be-49f0-8512-44840f3c4d1b"}] [stderr] Orphan BOSH deployments detected with no corresponding service instance in Cloud Foundry. Before deleting any deployment it is recommended to verify the service instance no longer exists in Cloud Foundry and any data is safe to delete. Errand 'orphan-deployments' completed with error (exit code 10)
These details are also available through the BOSH /tasks/
API endpoint for use in scripting:
$ curl 'https://bosh-user:bosh-password@bosh-url:25555/tasks/task-id/output?type=result' | jq .
{
"exit_code": 10,
"stdout": "[{"deployment_name":"service-instance_80e3c5a7-80be-49f0-8512-44840f3c4d1b"}]\n",
"stderr": "Orphan BOSH deployments detected with no corresponding service instance in Cloud Foundry. Before deleting any deployment it is recommended to verify the service instance no longer exists in Cloud Foundry and any data is safe to delete.\n",
"logs": {
"blobstore_id": "d830c4bf-8086-4bc2-8c1d-54d3a3c6d88d"
}
}
If no orphan deployments exist—The errand script does the following:
- Exit with exit code 0
- Stdout is an empty list of deployments
- Stderr is
None
[stdout] [] [stderr] None Errand 'orphan-deployments' completed successfully (exit code 0)
If the errand encounters an error during running—The errand script does the following:
- Exit with exit 1
- Stdout is empty
- Any error messages are under stderr
To clean up orphaned instances, run the following command on each instance:
Running this command might leave IaaS resources in an unusable state.
bosh delete-deployment service-instance_SERVICE-INSTANCE-GUID
View resource saturation and scaling
To view usage statistics for any service, do the following:
-
Run the following command:
bosh -d DEPLOYMENT-NAME vms --vitals
-
To view process-level information, run:
bosh -d DEPLOYMENT-NAME instances --ps
Identify apps using a service instance
To identify which apps are using a specific service instance from the name of the BOSH deployment:
- Take the deployment name and strip the
service-instance_
leaving you with the GUID. - Log in to CF as an admin.
-
Obtain a list of all service bindings by running the following:
cf curl /v2/service_instances/GUID/service_bindings
-
The output from the curl gives you a list of
resources
, with each item referencing a service binding, which contains theAPP-URL
. To find the name, org, and space for the app, run the following:cf curl APP-URL
and record the app name underentity.name
.cf curl SPACE-URL
to obtain the space, using theentity.space_url
from the curl. Record the space name underentity.name
.-
cf curl ORGANIZATION-URL
to obtain the org, using theentity.organization_url
from the curl. Record the organization name underentity.name
.
When you run cf curl
ensure that you query
all pages, because the responses are limited to a certain number of bindings per page.
The default is 50.
To find the next page curl the value under next_url
.
Monitor quota saturation and service instance count
Quota saturation and total number of service instances are available through ODB metrics emitted to Loggregator. The metric names are shown in the following table:
Metric Name | Description |
---|---|
on-demand-broker/SERVICE-NAME-MARKETPLACE/quota_remaining |
global quota remaining for all instances across all plans |
on-demand-broker/SERVICE-NAME-MARKETPLACE/PLAN-NAME/quota_remaining |
quota remaining for a particular plan |
on-demand-broker/SERVICE-NAME-MARKETPLACE/total_instances |
total instances created across all plans |
on-demand-broker/SERVICE-NAME-MARKETPLACE/PLAN-NAME/total_instances |
total instances created for a given plan |
Quota metrics are not emitted if no quota has been set.
Techniques for troubleshooting highly available clusters
If your cluster is experiencing downtime or in a degraded state, VMware recommends gathering information to diagnose the type of failure the cluster is experiencing with the following workflow:
- Consult solutions for common errors. See Highly Available Cluster Troubleshooting Errors.
- Use
mysql-diag
to view a summary of the network, disk, and replication state of each cluster node. Depending on the output frommysql-diag
, you might recover your cluster with the following troubleshooting techniques:- To force a node to rejoin the cluster, see Force a node to rejoin a highly available cluster manually.
- To recreate a corrupted VM, see Recreate a corrupted VM in a highly available cluster.
- To check if replication is working, see Check replication in a highly available cluster.
mysql-diag
, see Running mysql-diag. - Run
bosh logs
targeting each of the VMs in your VMware SQL with MySQL for TAS cluster, proxies, and jumpbox to retrieve the VM logs. You must runbosh logs
before attempting recovery because any failures in the recovery procedure can result in logs being lost or made inaccessible.
For more information, see the Downloading logs section.
- If you are uncertain about the recovery steps to take, submit a ticket through Support. When you submit a ticket, provide the following information:
- mysql-diag output: A summary of the network, disk, and replication state. The Running mysql-diag topic explains how to run mysql-diag.
- downloaded logs: Logs from your VMware SQL with MySQL for TAS cluster, proxies, and jumpbox VM. The Downloading logs section explains how to obtain these.
- Deployment environment: The environment that VMware SQL with MySQL for TAS is running in such as VMware Tanzu Application Service for VMs or a service tile.
- Version numbers: The versions of the installed Ops Manager, TAS for VMs, and VMware SQL with MySQL for TAS.
Do not attempt to resolve cluster issues by reconfiguring the cluster, such as changing the number of nodes or networks. Follow only the diagnosis steps in this document. If you are unsure how to proceed, contact Support.
Force a node to rejoin a highly available cluster manually
If a detached node fails to rejoin the cluster after a configured grace period, you can manually force the node to rejoin the cluster. This procedure removes all the data on the node, forces the node to join the cluster, and creates a new copy of the cluster data on the node.
If you manually force a node to rejoin the cluster, data stored on the local node is lost. Do not force nodes to rejoin the cluster if you want to preserve unsynchronized data. Only do this procedure with the assistance of Support.
Before following this procedure, try to bootstrap the cluster. For more information, see
Bootstrapping.
To manually force a node to rejoin the cluster, do the following:
- SSH into the node by following the procedure in SSH into the BOSH Director VM.
-
Become root by running:
sudo su
-
Shut down the
mysqld
process on the node by running:monit stop galera-init
-
Remove the unsynchronized data on the node by running:
rm -rf /var/vcap/store/pxc-mysql
-
Prepare the node before restarting by running:
/var/vcap/jobs/pxc-mysql/bin/pre-start
- Restart the
mysqld
process by running:monit start galera-init
Recreate a corrupted VM in a highly available cluster
To re-create a corrupted VM:
- To log in to the BOSH Director VM by doing the following procedures:
- Gather the information needed to log in to the BOSH Director VM by doing the procedure in Gather Credential and IP Address Information.
- Log in to the Tanzu Operations Manager VM by doing the procedure in Log in to the Tanzu Operations Manager VM with SSH.
- Log in to the BOSH Director VM by doing the procedure in SSH Into the BOSH Director VM.
-
Identify and re-create the unresponsive node with
bosh cloudcheck
, by doing the procedure in BOSH Cloud Check and runRecreate VM using last known apply spec
.Recreating a node clears the logs. Ensure the node is completely down before recreating it.
Only recreate one node. Do not recreate the entire cluster. If more than one node is down, contact Support.
Check replication status in a highly available cluster
If you see stale data in your cluster, you can check whether replication is functioning normally.
To check the replication status, do the following:
-
To log in to the BOSH Director VM, do the following:
- Gather the information needed to log in to the BOSH Director VM by using the procedure in [Gather credential and IP Address information](https://techdocs.broadcom.com/us/en/vmware-tanzu/platform/tanzu-operations-manager/3-0/tanzu-ops-manager/install-trouble-advanced.html).
- Log in to the Ops Manager VM by doing the procedure in [Log in to the Ops Manager VM with SSH](https://techdocs.broadcom.com/us/en/vmware-tanzu/platform/tanzu-operations-manager/3-0/tanzu-ops-manager/install-ssh-login.html).
- Create a dummy database in the first node by running:
mysql -h FIRST-NODE-IP-ADDRESS \ -u YOUR-IDENTITY \ -p -e "create database verify_healthy;"
Where:FIRST-NODE-IP-ADDRESS
is the IP address of the first node you recorded in step 1.YOUR-IDENTITY
is the value ofidentity
that you recorded in step 1.
- Create a dummy table in the dummy database by running:
mysql -h FIRST-NODE-IP-ADDRESS \ -u your-identity \ -p -D verify_healthy \ -e "create table dummy_table (id int not null primary key auto_increment, info text) \ engine='innodb';"
- Insert data into the dummy table by running:
mysql -h FIRST-NODE-IP-ADDRESS \ -u YOUR-IDENTITY \ -p -D verify_healthy \ -e "insert into dummy_table(info) values ('dummy data'),('more dummy data'),('even more dummy data');"
- Query the table and verify that the three rows of dummy data exist on the first node by running:
mysql -h FIRST-NODE-IP-ADDRESS \ -u YOUR-IDENTITY \ -p -D verify_healthy \ -e "select * from dummy_table;"
When prompted for a password, provide the `password` value recorded in step 1. The previous command returns output similar to the following:+----+----------------------+ | id | info | +----+----------------------+ | 4 | dummy data | | 7 | more dummy data | | 10 | even more dummy data | +----+----------------------+
- Verify that the other nodes contain the same dummy data by doing the following for each of the remaining MySQL server IP addresses:
- Query the dummy table by running:
mysql -h NEXT-NODE-IP-ADDRESS \ -u YOUR-IDENTITY \ -p -D verify\_healthy \ -e "select * from dummy_table;"
When prompted for a password, provide the `password` value recorded in step 1. - Verify that the node contains the same three rows of dummy data as the other nodes by running:
mysql -h NEXT-NODE-IP-ADDRESS \ -u YOUR-IDENTITY \ -p -D verify\\_healthy \ -e "select \* from dummy\\_table;"
When prompted for a password, provide the `password` value recorded in step 1. - Verify that the previous command returns output similar to the following:
+----+----------------------+ | id | info | +----+----------------------+ | 4 | dummy data | | 7 | more dummy data | | 10 | even more dummy data | +----+----------------------+
- Query the dummy table by running:
- If each MySQL server instance does not return the same result, before proceeding further or making any changes to your deployment, contact Support.
If each MySQL server instance returns the same result, then you can safely proceed to scaling down your cluster to a single node.
Tools for Troubleshooting
The troubleshooting techniques use these tools.
Downloading logs
The following are steps to gather logs from your MySQL cluster nodes, MySQL proxies, and, with highly available clusters, the jumpbox VM.
-
From Tanzu Operations Manager, open your BOSH Director tile > Credentials tab.
-
Click Bosh Commandline Credentials Link to Credential. A short plaintext file opens.
-
From the plaintext file, record the values listed:
BOSH_CLIENT
BOSH_CLIENT_SECRET
BOSH_CA_CERT
BOSH_ENVIRONMENT
-
From the BOSH CLI, run
bosh deployments
and record the name of the BOSH deployment that deployed MySQL for VMware Tanzu for MySQL. -
SSH into your Tanzu Operations Manager VM. For information about how to do this, see Gather Credential and IP Address Information and SSH into Tanzu Operations Manager.
-
Set local environment variables to the same BOSH variable values that you recorded earlier, including
BOSH_DEPLOYMENT
for the deployment name you recorded above. For example:$ export BOSH_CLIENT=ops_manager \ BOSH_CLIENT_SECRET=a123bc-E_4Ke3fb-gImbl3xw4a7meW0rY \ BOSH_CA_CERT=/var/tempest/workspaces/default/root_ca_certificate \ BOSH_ENVIRONMENT=10.0.0.5 \ BOSH_DEPLOYMENT=pivotal-mysql-14c4
If you connect to your BOSH director through a gateway, you also need to set variables
BOSH_GW_HOST, BOSH_GW_USER
, andBOSH_GW_PRIVATE_KEY.
-
Use the
bosh logs
command to retrieve logs for any instances in your deployment that are nameddatabase
or prefixed withmysql
(such asmysql-jumpbox
).The following lines show one way to perform this:
For more information, see the bosh logs documentation.$ tempdir="$(mktemp -d -t MYSQLLOGS-XXXXXX)" echo Saving logfiles to "${tempdir}" for node in $(bosh instances --column="Instance" | grep -E "(database|mysql.*)/"); do echo -e "\nDownloading logs for: ${node}" bosh logs --dir="${tempdir}" ${node} done tar czf "${tempdir}/mysql-logs.tar.gz" ./* echo Bundled logfiles are in "${tempdir}/mysql-logs.tar.gz"
-
Download the retrieved logfiles to your local laptop for inspection.
bosh scp
from your local workstation can be used to retrieve files on a BOSH VM. For more information, see the bosh scp documentation.
mysql-diag
mysql-diag
outputs the current status of a highly available (HA)
MySQL cluster in VMware Tanzu for MySQL and suggests recovery actions if the cluster fails.
For more information, see Running mysql-diag.
Knowledge Base (Community)
Find the answer to your question and browse product discussions and solutions by searching the VMware Tanzu Knowledge Base.
File a support ticket
You can file a ticket with Support. Be sure to provide the error message from cf service YOUR-SERVICE-INSTANCE
.
To expedite troubleshooting, provide your service broker logs and your service instance logs. If your cf service YOUR-SERVICE-INSTANCE
output includes a task-id
, provide the BOSH task output.
Content feedback and comments