Where to find Logs
Installer logs provides vital information on what went wrong. Access the installer logs - logs_installer.log in the root folder from where the installer command is run.
Service Status
Check status of all packages being installed by running following commands.
kubectl -n tanzusm get packageInstalls
Describe particular package Installs
kubectl -n tanzusm describe packageInstalls <package name>
List all apps with kapp command
kapp ls -n tanzusm --column Name
List app and resource status
kapp inspect -a <app name listed in above command> -n tanzusm
Prometheus Data Collection
Prometheus data can offer insights to RED metrics of the TPSM services. Below are the steps to collect this data.
Prerequisite: jq
(https://jqlang.github.io/jq/download/) mostly this will be available in jumper box if not install it and make it executable.
To create a dump of Prometheus data:
- Export the cluster config file to KUBECONFIG environment variable
export KUBECONFIG=<cluster kube config file location full path>
- Find out the Prometheus pod name and store it in a variable
TP_PROMETHEUS_POD="$(kubectl get pods -l app.kubernetes.io/part-of=prometheus -l app.kubernetes.io/component=server -n tanzusm -o json | jq -r .'items[0].metadata.name')"
- Port forward the Prometheus admin API to invoke the Prometheus data snapshot creation from localhost
kubectl port-forward pods/$TP_PROMETHEUS_POD 9090:9090 -n tanzusm
- Open a new terminal and export the KUBECONFIG (same as step 1) and run the below commands
- Create the Prometheus data snapshot in Prometheus server and store the snapshot name in a variable
TP_PROMETHEUS_SNAPSHOT_NAME=$(curl -X POST -s http://localhost:9090/prometheus/api/v1/admin/tsdb/snapshot | jq -r .data.name)
- Create a directory in local to copy the snapshot from cluster
mkdir -p prometheus-data/$TP_PROMETHEUS_SNAPSHOT_NAME
- Find out the Prometheus pod name and store it in a variable
TP_PROMETHEUS_POD="$(kubectl get pods -l app.kubernetes.io/part-of=prometheus -l app.kubernetes.io/component=server -n tanzusm -o json | jq -r .'items[0].metadata.name')"
- Copy the snapshot to the local
kubectl cp -n tanzusm $TP_PROMETHEUS_POD:/bitnami/prometheus/data/snapshots/$TP_PROMETHEUS_SNAPSHOT_NAME prometheus-data/$TP_PROMETHEUS_SNAPSHOT_NAME
- Create a tar of the local directory and upload it
tar -czvf prometheus-$TP_PROMETHEUS_SNAPSHOT_NAME.tar.gz prometheus-data/$TP_PROMETHEUS_SNAPSHOT_NAME
- Go to the previous terminal session and close the port forwarding by pressing control + c in the keyboard
Troubleshooting common issues
Use the following table to list Troubleshooting common issues.
Issue | Description | Solution |
---|---|---|
After Restore Pods for a few of the services stuck in crash loopback off after sufficiently waiting for reconciliation
|
After restoring and sufficiently waiting for the pods of a few of the services stuck in crash-loop-backoff |
Delete the pods and Kapp kick the app that encapsulates those pods. kubectl -n tanzusm delete pods
kctrl package installed kick -i -n tanzusm
|
Velero Storage Backend Misconfiguration
|
Symptom: Backups are not stored correctly, or storage errors occur Possible Causes: Incorrect bucket names or paths. Misconfigured storage provider credentials. |
Check Storage Configuration: Ensure that Velero's credentials have access to the storage backing |
Pods going into evicted state |
Symptom: Few of the pods being observed in the evicted state Possible Causes: health of the nodes. |
This issue is due to cluster capacity, run the below command to get the node's status and check the health of the nodes.kubectl get nodes Perform necessary actions: increase the number of nodes in the cluster and ensure that the overall cluster capacity is sufficient for installation. |
Velero command fails | Symptom: Velero create backup Or Velero restore command fails |
Check the connectivity of the Object store from within the workload cluster.kubectl get pods -n velero kubectl logs -n velero -f Check that none of the node agents and controller pods are in an error state on Velero namespace. kubectl get pods -n velero
|
After startup, application dashboard, which shows vulnerablities, takes time to show data. | Symptom: Due to multiple reconcilers in system, application takes time to sync. |
Wait for one or two hours after installation, because reconciliation of background information takes time. |
Kafka coordinator issue and the consumers are not able to connect to Kafka | Symptom: Due to an incorrect start/stop sequence, Kafka goes into an inconsistent state. |
Run the clean_kafka.sh mentioned in the Post Restore Action section. |
After adding/updating the certificate TPSM installer command fails | Symptom: This happens due to certificate content is not aligned as per yaml string format. |
The certificate in the `config.yaml` file must be provided as string literal in yaml format. Eg. certificate: | -----BEGIN CERTIFICATE----- .... .... -----END CERTIFICATE----- |
After updating the certificate user is not able to login | Symptom: Sometimes Carvel interrupts the stakater reloader to restart the deployment. |
Restart the below services once certificate is updated. kubectl rollout restart deployment/graphql-stitching-service deployment/uaa deployment/ucp-core-controllers -n tanzusm |
After updating the certificate user is not able to login | Symptom: Some times it has been found that user makes mistake where they update the tls certificate public key but do no update the private key. |
Logs from Contour-Envovy pod: "Failed to load private key from , Cause: error:0b000074:X.509 certificate routines:OPENSSL_internal:KEY_VALUES_MISMATCH" code=13 connection=76 context=xds node_id=contour-envoy-775df8f468-rh7xj node_version=v1.28.2 response_nonce=62 version_info= |
Seaweedfs: When we get a 'no writable space' error in Seaweed, use the given commands to resolve the issue. | Symptom: E1126 08:10:06.993985 s3api_object_handlers_put.go:161 upload to filer error: rpc error: code = Unknown desc = failed to find writable volumes for collection:daedalus replication:000 ttl: error: No more writable volumes! |
|
Installer command-line help
To get help with the installer commands, use the --help
flag. For example:
tanzu-sm-installer --help
You can also use this flag with sub-commands. For example:
tanzu-sm-installer install --help
Installer sample commands
tanzu-sm-installer verify \
-f config.yaml \
-u "${ARTIFACTORY_USER}:${ARTIFACTORY_API_TOKEN}" \
-r ${DOCKER_REGISTRY}/hub-self-managed/${TANZU_SM_VERSION}/repo \
--kubeconfig ${KUBECONFIG}
tanzu-sm-installer install \
-f config.yaml \
-u "${ARTIFACTORY_USER}:${ARTIFACTORY_API_TOKEN}" \
-r ${DOCKER_REGISTRY}/hub-self-managed/${TANZU_SM_VERSION}/repo \
--yes
tanzu-sm-installer post-verify \
--kubeconfig ${KUBECONFIG}
tanzu-sm-installer push collectors \
-a "${REGISTRY_USERNAME}:{$REGISTRY_PASSWORD}" \
-r "${REGISTRY_ENDPOINT}" \
-f tanzusm-collector.tar -s
tanzu-sm-installer push tanzu-plugins \
-u "${REGISTRY_USERNAME}:${REGISTRY_PASSWORD}" \
-r "${REGISTRY_ENDPOINT}/${REPO_PATH}" \
-i tanzu-bundle/tpsm-plugin-bundle.tar.gz
tanzu-sm-installer push tmc-extensions \
-a "${REGISTRY_USERNAME}:${REGISTRY_PASSWORD}" \
-r "${REGISTRY_ENDPOINT}/${REPO_PATH}" \
-f agent-images.tar
tanzu-sm-installer log \
--kubeconfig ${KUBECONFIG}
tanzu-sm-installer reset \
--kubeconfig ${KUBECONFIG} -p
tanzu-sm-installer velero install \
--provider aws \
--image harbor.tanzu.io:8443/library/velero:v1.14.1 \
--plugins harbor.tanzu.io:8443/library/velero/velero-plugin-for-aws:v1.10.0 \
--bucket <BUCKET_NAME> \
--secret-file <PATH_TO_CREDENTIAL_FILE> \
--use-volume-snapshots=false \
--features=EnableCSI \
--use-node-agent \
--backup-location-config region=<OBJECT_STORAGE_SERVICE_REGION>,s3ForcePathStyle="true",s3Url=<OBJECT_STORAGE_SERVICE_PATH>
tanzu-sm-installer velero backup create <KAPP_ARTIFACT_BACKUP_NAME> \
--snapshot-move-data \
--include-resources=apps.kappctrl.k14s.io,packageinstalls.packaging.carvel.dev \
--include-namespaces tanzusm
tanzu-sm-installer velero restore create <SECRET_RESTORE_NAME> \
--snapshot-move-data \
--from-backup <FULL_BACKUP_NAME> \
--include-resources=secrets
tanzu-sm-installer velero schedule create kapp-backup-schedule \
--snapshot-move-data \
--schedule="@every 4h" \
--include-resources=apps.kappctrl.k14s.io,packageinstalls.packaging.carvel.dev \
--include-namespaces tanzusm
Create an installation log bundle
To create an installation log bundle, run the following command:
tanzu-sm-installer log
By default, this command creates the log bundle in /tmp/tanzusm
.
The following table describes the options of the tanzu-sm-installer log
command.
Short name | Full name | Description | Default value |
---|---|---|---|
-h |
--help |
Help for the log command |
NA |
-k |
--kubeconfig |
Absolute path of the kubeconfig file that connects to the kubernetes cluster on which Tanzu Platform is running | NA |
-n |
--namespace |
Name of the namespace in which Tanzu Platform is running | tanzusm |
-o |
--outdir |
Location in which to create the log bundle if not the default location | /tmp/tanzusm |
-w |
--workdir |
Working directory for Crashd and Starlark | /tmp/tanzusm/work |
Content feedback and comments