Tanzu Platform 10.0

Configure horizontal autoscaling of an application

Last Updated March 03, 2025

This topic explains how to configure horizontal autoscaling for a Kubernetes application. This automatically creates or removes instances of the application based on current values of metrics such as CPU and memory consumption.

Configuring horizontal autoscaling allows you to set the minimum and maximum numbers of instances (scaling bounds) and to specify either average use thresholds or utilization thresholds for CPU and memory use.

Autoscaling of an application is deactivated by default until you configure it for that application, as described in this topic. When autoscaling is deactivated, the application can only be scaled manually by running the tanzu app scale command, which can set a fixed number of instances and the requested amount of CPU and memory. For more information about manual scaling, see Scale applications.

Autoscaling fails to create new application instances if those new instances surpass the Space’s resource allocation. In this case you must increase the amount of resources allocated to the Space.

Before you begin

Before you can configure horizontal autoscaling for an application, you must meet these prerequisites:

  • Ensure that the Space requires the horizontal-autoscaling.tanzu.vmware.com Capability by using a Profile that requires the Capability.

  • Ensure that metrics are available on the Kubernetes clusters where your Space replicas are scheduled. Metrics are available if the clusters include a Metrics server. If a cluster does not have a Metrics server, add one by installing the Kubernetes Metrics Server Capability on the cluster’s cluster group.

Activate or change horizontal autoscaling

To activate or change horizontal autoscaling, specify:

  • A maximum number of instances your application is allowed to have per Space replica.
  • At least one scaling threshold.
  • (Optional) A minimum number of instances. The default is 1.

The following sections describe how to configure these thresholds.

Overview of autoscale thresholds

The two types of thresholds are average use and utilization. Each type of threshold can be applied to the two types of resources, which are CPU and memory.

A CPU average use threshold is the average CPU use across all application instances in a Space replica. CPU utilization is the ratio (expressed as a percentage) of the current CPU use across all application instances in a Space replica to the configured requested CPU value of the application. The same is true for memory thresholds.

For example, a CPU average use threshold of 200m means that an application automatically scales up when its average use of CPU passes 200 millicores, while a memory utilization of 85% means that autoscaling happens when the application starts using more than 85% of the amount of memory it has requested. To edit the amount of requested resources for an application, see Update application CPU and memory.

Because both types of thresholds are simply different representations of the same concept, you can configure only one CPU threshold and one memory threshold at the same time.

In other words, you can configure an average use threshold for CPU or a utilization threshold for CPU, but not both, and an average use threshold for memory or a utilization threshold for memory, but not both.

Configure autoscaling

Tanzu CLI-based steps
You can use Tanzu CLI commands to activate or update the autoscaling configuration for an app.

The configuration options are:

  • --min and --max
  • --cpu-average-value and --memory-average-value
  • --cpu-utilization and --memory-utilization

The tanzu app autoscale command includes an interactive mode. When you run tanzu app autoscale <APP-NAME>, the command prompts you to provide a new value for each configurable autoscaling option.

  1. Ensure that your Project and Space are set correctly by running:

    tanzu project use PROJECT-NAME
    tanzu space use SPACE-NAME
    
  2. To activate horizontal autoscaling, run the tanzu app autoscale command with the --max option and at least one autoscaling option. After setting --max, you don’t need to specify it again when updating the autoscaling configuration unless you want to change the maximum number of instances. For example, you can run:

    tanzu app autoscale <APP-NAME> --max=10 --cpu-average-value=200m
    

    Where <APP-NAME> is the ContainerApp name of your application.

  3. You can make multiple changes at the same time. For example, to remove an existing autoscaling threshold and add a different one, run:

    tanzu app autoscale <APP-NAME> --cpu-average-value- --memory-utilization=90%
    

    This example removes the CPU average use threshold that was previously configured and replaces it with a CPU utilization threshold. Remember that you must have at least one threshold configured at all times. The CLI enforces a valid configuration, so you don’t need to worry about making a mistake.

    By using these CLI options you can edit, add, and remove the different possible scaling thresholds and the scaling bounds.

  4. View the autoscaling configuration and current metric values by running:

    tanzu app get <APP-NAME>
    

Fix horizontal autoscaling errors

Tanzu Platform makes it as easy as possible to avoid issues with the configuration of horizontal autoscaling. However, it is still possible in rare cases to have autoscaling errors, such as when manually modifying resources. The tanzu app list command highlights applications that have autoscaling errors and guides you to the tanzu app get <APP-NAME> command.

When autoscaling is active for an app, the tanzu app get <APP-NAME> shows a section titled Autoscaling Details with a Status subsection. If there are autoscaling errors, the Status subsection shows the details of the errors to guide you on how to fix the errors.

Deactivate horizontal autoscaling

Horizontal autoscaling can be deactivated at any time by manually setting the number of instances an application can have.

To deactivate horizontal autoscaling, see Update the application instance count.

Deactivating horizontal autoscaling forces the app to scale up or down to reach the fixed number of instances you have set.