Tanzu Platform 10.0

Keeping your Tanzu Platform Self-Managed deployment up and running

Last Updated March 03, 2025

Monitor your Tanzu Platform Self-Managed deployment with Prometheus and Grafana

After you have deployed Tanzu Platform Self-Managed, you can use Prometheus with Grafana to monitor your deployment.

The Prometheus server is installed, by default, with Tanzu Platform Self-Managed. It runs in the tanzusm namespace. The procedure below explains how to deploy Grafana.


In an air-gapped environment, you must first download the Grafana images and then upload them into your local image registry. Then download and modify the image references in the helm charts and install from local.

Deploying Grafana

This procedure describes how to deploy Grafana to the cluster that is hosting your Tanzu Platform Self-Managed deployment.

  1. Create a namespace monitoring to deploy Grafana into, so that it can be easily uninstalled later.
    kubectl create namespace monitoring
    
  2. Get the Prometheus server IP.
    kubectl get service prometheus-server -n tanzusm -o jsonpath='{.spec.clusterIP}'
    
  3. Create grafana_values.yaml for Helm, replacing <PROMETHEUS-IP> with the output from the previous step.
    datasources:
          secretDefinition:
            apiVersion: 1
            datasources:
              - name: Prometheus
                type: prometheus
                access: proxy
                orgId: 1
                url: http://<PROMETHEUS-IP>/prometheus
                version: 1
                editable: true
                isDefault: true
              - name: Alertmanager
                uid: alertmanager
                type: alertmanager
                access: proxy
                orgId: 1
                url: http://<PROMETHEUS-IP>:9093
                version: 1
                editable: true
    
  4. Install the Grafana Helm chart. (You might need to install Helm first.)
    helm install grafana --values grafana_values.yaml --namespace monitoring oci://registry-1.docker.io/bitnamicharts/grafana
    
  5. Create a service to access Grafana.
    kubectl expose deployment grafana --port=80 --target-port=3000 --name=grafana-ext --type=LoadBalancer -n monitoring
    
  6. Get the external IP of the service.
    kubectl get service grafana-ext -n monitoring -oyaml -o jsonpath='{.status.loadBalancer.ingress[0].ip}'
    
  7. Get the credentials. Username is admin. Fetch the password by running the following script.
    echo "Password: $(kubectl get secret grafana-admin --namespace monitoring -o jsonpath="{.data.GF_SECURITY_ADMIN_PASSWORD}" | base64 -d)"
    
  8. Log in to Grafana with the credentials from the previous step, and then import the provided dashboard (which can be found at /data/additional-resources/content/grafana/tpsm_grafana_dashboard.json tpsm_grafana_dashboard.json).
  9. To get the cluster-level data, install metrics-server and node-exporter.
    helm install kube-state-metrics --namespace monitoring oci://registry-1.docker.io/bitnamicharts/kube-state-metrics
    helm install node-exporter --namespace monitoring  oci://registry-1.docker.io/bitnamicharts/node-exporter
    

Dynamically reconfigure your Tanzu Platform Self-Managed deployment

Scale business services
Tanzu Platform Self-Managed supports the following profiles to meet your scaling requirements.

  • evaluation
  • foundation
  • regular
  • enterprise

To change the profile for your deployment:

  1. Make sure your cluster has sufficient resources to support the profile.
  2. Update the profile setting in your config.yaml.
  3. Re-run the installer.

Manage certificates

Internal certificates are renewed by cert-manager.

To rotate certificates:

  1. Update your certificates.
  2. Re-run the installer.
  3. After the installer completes, run the following command to restart the Tanzu Platform services.
    kubectl rollout restart deployment/graphql-stitching-service deployment/uaa deployment/ucp-core-controllers -n tanzusm
    

Back up and restore your Tanzu Platform stack

Tanzu Platform Self-Managed uses Velero with CSI Snapshot-based backup and restore capabilities.

The procedures in this section assume you have already installed velero on your cluster as described in Set up Velero for your cluster.

Back up your Tanzu Platform stack

Use the commands shown in this section to create backups.


Tanzu Platform Self-Managed creates a full backup by default. Make sure your object storage has sufficient capacity to handle your manual and scheduled backups.
Tanzu Platform Self-Managed retains backups indefinitely. Configure ttl for the backup storage location as per the retention policy.

Create a backup

tanzu-sm-installer velero backup create <KAPP_ARTIFACT_BACKUP_NAME> --include-resources=apps.kappctrl.k14s.io,packageinstalls.packaging.carvel.dev --snapshot-move-data --include-namespaces tanzusm 
tanzu-sm-installer velero backup create <NAMESPACE_BACKUP_NAME> --snapshot-move-data --include-namespaces <FULL_BACKUP_NAME> --exclude-resources certificaterequests.cert-manager.io

Create a backup schedule

tanzu-sm-installer velero schedule create kapp-backup-schedule --schedule="@every 4h" --snapshot-move-data --include-resources=apps.kappctrl.k14s.io,packageinstalls.packaging.carvel.dev --include-namespaces tanzusm 
tanzu-sm-installer velero schedule create full-backup-schedule --schedule="@every 4h" --snapshot-move-data --include-namespaces <FULL_BACKUP_NAME> --exclude-resources certificaterequests.cert-manager.io 

Restore your Tanzu Platform stack from a backup

On cluster deletion or upgrade failure (when the environment goes into an inconsistent state), you can restore the previous working state.

Use the commands shown in this section to restore backups.

Before you begin

  • Verify the presence of the backup that you want to restore and validate the status of backup.
    tanzu-sm-installer velero backup get
    
  • Make sure there are no pods, deployments, stateful sets, secrets, configmaps, postgres endpoints, persistent volume claims, or persistent volumes.
    Use the following kubectl commands to check for residual resources in the cluster.
    kubectl get pods -n tanzusm
    kubectl get pkgi -n tanzusm
    kubectl get app -n tanzusm
    kubectl get pvc -n tanzusm
    kubectl get pv -n tanzusm
    kubectl get crd -n tanzusm
    kubectl get cr -n tanzusm
    kubectl get postgresendpoint -n tanzusm
    kubectl get cm -n tanzusm
    
  • Make sure static and non-static resources, such as IP range and hostname, are not in use.

Procedure

  1. Reset the tanzusm namespace using the tanzusm installer.
    tanzu-sm-installer reset --kubeconfig=<KUBECONFIG_FILE> -p 
    
  2. Restore the secrets from the backup.
    tanzu-sm-installer velero restore create <SECRET_RESTORE_NAME> --from-backup  <FULL_BACKUP_NAME>  --include-resources=secrets 
    
  3. Restore the kapp artifact.
    tanzu-sm-installer velero restore create <KAPP_RESTORE_NAME> --from-backup <KAPP_ARTIFACT_BACKUP_NAME> 
    
  4. Restore other artifacts, if any.

    tanzu-sm-installer velero restore create  <FULL_RESTORE_NAME> --from-backup  <FULL_BACKUP_NAME> 
    


    After performing a restore, restart kafka services to avoid intermittent issues in kafka services related to leader election among kafka brokers.

  5. After the restore has completed successfully, wait 30 minutes to allow all resources and services to restart.
  6. After the 30 minutes, run the following script.
    clean_kafka.sh

    #!/bin/bash
    set -x
    NAMESPACE="tanzusm"
    
    echo "Fetching PVCs and PVs in namespace $NAMESPACE..."
    echo "Filtering PVCs and PVs for 'kafka-datadir-ops-kafka'..."
    
    kctrl package installed pause -i ops-kafka -n "$NAMESPACE"
    
    kubectl -n "$NAMESPACE" scale StatefulSet ops-kafka --replicas=0
    sleep 10
    
    kubectl get pv -n "$NAMESPACE" --no-headers -o custom-columns="NAME:.metadata.name,CLAIM:.spec.claimRef.name" | grep -i "kafka-datadir-ops-kafka" | while read -r PV_NAME PVC_NAME; do
    
      echo "PV: $PV_NAME, PVC: $PVC_NAME"
      kubectl delete pvc $PVC_NAME -n "$NAMESPACE" --force &
      kubectl delete pv $PV_NAME -n "$NAMESPACE" --force  &
      kubectl patch pv "$PV_NAME" -p '{"metadata":{"finalizers":[]}}' --type='merge' -n "$NAMESPACE"
      kubectl patch pvc "$PVC_NAME" -p '{"metadata":{"finalizers":[]}}' --type='merge' -n "$NAMESPACE"
    done
    
    kubectl -n "$NAMESPACE" scale StatefulSet ops-kafka --replicas=3
    sleep 10
    
    kctrl package installed kick -i ops-kafka -n "$NAMESPACE"