Keeping your Tanzu Platform Self-Managed deployment up and running

Last Updated March 03, 2025

Monitor your Tanzu Platform Self-Managed deployment with Prometheus and Grafana

After you have deployed Tanzu Platform Self-Managed, you can use Prometheus with Grafana to monitor your deployment.

The Prometheus server is installed, by default, with Tanzu Platform Self-Managed. It runs in the tanzusm namespace. The procedure below explains how to deploy Grafana.

In an air-gapped environment, you must first download the Grafana images and then upload them into your local image registry. Then download and modify the image references in the helm charts and install from local.

Deploying Grafana

This procedure describes how to deploy Grafana to the cluster that is hosting your Tanzu Platform Self-Managed deployment.

Create a namespace monitoring to deploy Grafana into, so that it can be easily uninstalled later.
```
kubectl create namespace monitoring
```

Get the Prometheus server IP.

kubectl get service prometheus-server -n tanzusm -o jsonpath='{.spec.clusterIP}'

Create grafana_values.yaml for Helm, replacing <PROMETHEUS-IP> with the output from the previous step.

datasources:
      secretDefinition:
        apiVersion: 1
        datasources:
          - name: Prometheus
            type: prometheus
            access: proxy
            orgId: 1
            url: http://<PROMETHEUS-IP>/prometheus
            version: 1
            editable: true
            isDefault: true
          - name: Alertmanager
            uid: alertmanager
            type: alertmanager
            access: proxy
            orgId: 1
            url: http://<PROMETHEUS-IP>:9093
            version: 1
            editable: true

Install the Grafana Helm chart. (You might need to install Helm first.)

helm install grafana --values grafana_values.yaml --namespace monitoring oci://registry-1.docker.io/bitnamicharts/grafana

Create a service to access Grafana.

kubectl expose deployment grafana --port=80 --target-port=3000 --name=grafana-ext --type=LoadBalancer -n monitoring

Get the external IP of the service.

kubectl get service grafana-ext -n monitoring -oyaml -o jsonpath='{.status.loadBalancer.ingress[0].ip}'

Get the credentials. Username is admin. Fetch the password by running the following script.

echo "Password: $(kubectl get secret grafana-admin --namespace monitoring -o jsonpath="{.data.GF_SECURITY_ADMIN_PASSWORD}" | base64 -d)"

Log in to Grafana with the credentials from the previous step, and then import the provided dashboard (which can be found at /data/additional-resources/content/grafana/tpsm_grafana_dashboard.json tpsm_grafana_dashboard.json).

To get the cluster-level data, install metrics-server and node-exporter.

helm install kube-state-metrics --namespace monitoring oci://registry-1.docker.io/bitnamicharts/kube-state-metrics
helm install node-exporter --namespace monitoring  oci://registry-1.docker.io/bitnamicharts/node-exporter

Dynamically reconfigure your Tanzu Platform Self-Managed deployment

Scale business services
Tanzu Platform Self-Managed supports the following profiles to meet your scaling requirements.

evaluation
foundation
regular
enterprise

To change the profile for your deployment:

Make sure your cluster has sufficient resources to support the profile.
Update the profile setting in your config.yaml.
Re-run the installer.

Manage certificates

Internal certificates are renewed by cert-manager.

To rotate certificates:

Update your certificates.
Re-run the installer.

After the installer completes, run the following command to restart the Tanzu Platform services.

kubectl rollout restart deployment/graphql-stitching-service deployment/uaa deployment/ucp-core-controllers -n tanzusm

Back up and restore your Tanzu Platform stack

Tanzu Platform Self-Managed uses Velero with CSI Snapshot-based backup and restore capabilities.

The procedures in this section assume you have already installed velero on your cluster as described in Set up Velero for your cluster.

Back up your Tanzu Platform stack

Use the commands shown in this section to create backups.

Tanzu Platform Self-Managed creates a full backup by default. Make sure your object storage has sufficient capacity to handle your manual and scheduled backups.
Tanzu Platform Self-Managed retains backups indefinitely. Configure ttl for the backup storage location as per the retention policy.

Create a backup

tanzu-sm-installer velero backup create <KAPP_ARTIFACT_BACKUP_NAME> --include-resources=apps.kappctrl.k14s.io,packageinstalls.packaging.carvel.dev --snapshot-move-data --include-namespaces tanzusm

tanzu-sm-installer velero backup create <NAMESPACE_BACKUP_NAME> --snapshot-move-data --include-namespaces <FULL_BACKUP_NAME> --exclude-resources certificaterequests.cert-manager.io

Create a backup schedule

tanzu-sm-installer velero schedule create kapp-backup-schedule --schedule="@every 4h" --snapshot-move-data --include-resources=apps.kappctrl.k14s.io,packageinstalls.packaging.carvel.dev --include-namespaces tanzusm

tanzu-sm-installer velero schedule create full-backup-schedule --schedule="@every 4h" --snapshot-move-data --include-namespaces <FULL_BACKUP_NAME> --exclude-resources certificaterequests.cert-manager.io

Restore your Tanzu Platform stack from a backup

On cluster deletion or upgrade failure (when the environment goes into an inconsistent state), you can restore the previous working state.

Use the commands shown in this section to restore backups.

Before you begin

Verify the presence of the backup that you want to restore and validate the status of backup.
```
tanzu-sm-installer velero backup get
```

Make sure there are no pods, deployments, stateful sets, secrets, configmaps, postgres endpoints, persistent volume claims, or persistent volumes.
Use the following kubectl commands to check for residual resources in the cluster.

kubectl get pods -n tanzusm
kubectl get pkgi -n tanzusm
kubectl get app -n tanzusm
kubectl get pvc -n tanzusm
kubectl get pv -n tanzusm
kubectl get crd -n tanzusm
kubectl get cr -n tanzusm
kubectl get postgresendpoint -n tanzusm
kubectl get cm -n tanzusm

Make sure static and non-static resources, such as IP range and hostname, are not in use.

Procedure

Reset the tanzusm namespace using the tanzusm installer.

tanzu-sm-installer reset --kubeconfig=<KUBECONFIG_FILE> -p

Restore the secrets from the backup.

tanzu-sm-installer velero restore create <SECRET_RESTORE_NAME> --from-backup  <FULL_BACKUP_NAME>  --include-resources=secrets

Restore the kapp artifact.

tanzu-sm-installer velero restore create <KAPP_RESTORE_NAME> --from-backup <KAPP_ARTIFACT_BACKUP_NAME>

Restore other artifacts, if any.
```
tanzu-sm-installer velero restore create  <FULL_RESTORE_NAME> --from-backup  <FULL_BACKUP_NAME> 
```
After performing a restore, restart kafka services to avoid intermittent issues in kafka services related to leader election among kafka brokers.
After the restore has completed successfully, wait 30 minutes to allow all resources and services to restart.

After the 30 minutes, run the following script.
clean_kafka.sh

#!/bin/bash
set -x
NAMESPACE="tanzusm"

echo "Fetching PVCs and PVs in namespace $NAMESPACE..."
echo "Filtering PVCs and PVs for 'kafka-datadir-ops-kafka'..."

kctrl package installed pause -i ops-kafka -n "$NAMESPACE"

kubectl -n "$NAMESPACE" scale StatefulSet ops-kafka --replicas=0
sleep 10

kubectl get pv -n "$NAMESPACE" --no-headers -o custom-columns="NAME:.metadata.name,CLAIM:.spec.claimRef.name" | grep -i "kafka-datadir-ops-kafka" | while read -r PV_NAME PVC_NAME; do

  echo "PV: $PV_NAME, PVC: $PVC_NAME"
  kubectl delete pvc $PVC_NAME -n "$NAMESPACE" --force &
  kubectl delete pv $PV_NAME -n "$NAMESPACE" --force  &
  kubectl patch pv "$PV_NAME" -p '{"metadata":{"finalizers":[]}}' --type='merge' -n "$NAMESPACE"
  kubectl patch pvc "$PVC_NAME" -p '{"metadata":{"finalizers":[]}}' --type='merge' -n "$NAMESPACE"
done

kubectl -n "$NAMESPACE" scale StatefulSet ops-kafka --replicas=3
sleep 10

kctrl package installed kick -i ops-kafka -n "$NAMESPACE"

Content feedback and comments

Tanzu Platform 10.0