Configure horizontal autoscaling of an application

Last Updated February 19, 2025

This topic explains how to configure horizontal autoscaling for a Kubernetes application. This automatically creates or removes instances of the application based on current values of metrics such as CPU and memory consumption or received HTTP requests.

Configuring horizontal autoscaling allows you to set the minimum and maximum numbers of instances (scaling bounds) and to specify either average use thresholds or utilization thresholds for CPU and memory use. It also allows you to specify an average requests per second threshold for received HTTP traffic.

Autoscaling of an application is deactivated by default until you configure it for that application, as described in this topic. When autoscaling is deactivated, the application can only be scaled manually by running the tanzu app scale command, which can set a fixed number of instances and the requested amount of CPU and memory. For more information about manual scaling, see Scale applications.

Autoscaling fails to create new app instances if those new instances surpass the Space’s resource allocation. In this case you must increase the amount of resources allocated to the Space.

Before you begin

Before you can configure horizontal autoscaling for an application, you must meet these prerequisites:

Ensure that the Space requires the Horizontal Autoscaling Capability by using a Profile that requires the Capability.
Install the Prometheus Operator package on the cluster groups where the Space is to be scheduled. Then require the Prometheus Operator Capability from the Space to ensure that the Space is scheduled on one of these clusters.
To use HTTP requests per second metrics, ensure that the Space requires the Observability Capability by using a Profile that requires the Capability.
To use HTTP autoscaling, you must have built your application using the build plug-in v0.17.1 or later.
To use CPU and memory metrics, ensure that these metrics are available on the Kubernetes clusters where your Space replicas are scheduled. These metrics are available if the clusters include a metrics server. If a cluster does not have a metrics server, add one by installing the Kubernetes Metrics Server Capability on the cluster’s cluster group.

VMware recommends that you enable a network policy implementation for your Kubernetes distribution. Tanzu Platform uses the policy to further secure the system when using the Prometheus operator for autoscaling.

Activate or change horizontal autoscaling

To activate or change horizontal autoscaling, specify:

A maximum number of instances your application is allowed to have per Space replica.
At least one scaling threshold.
(Optional) A minimum number of instances. The default is 1.

The following sections describe how to configure these thresholds.

Overview of autoscale thresholds

The two types of thresholds are average use and utilization. Each type of threshold can be applied to the two types of resources, which are CPU and memory.

A CPU average use threshold is the average CPU use across all app instances in a Space replica. CPU utilization is the ratio (expressed as a percentage) of the current CPU use across all app instances in a Space replica to the configured requested CPU value of the application. The same is true for memory thresholds.

For example, a CPU average use threshold of 200m means that an application automatically scales up when its average use of CPU passes 200 millicores, while a memory utilization of 85% means that autoscaling happens when the application starts using more than 85% of the amount of memory it has requested. To edit the amount of requested resources for an application, see Update application CPU and memory.

Because both types of thresholds are simply different representations of the same concept, you can configure only one CPU threshold and one memory threshold at the same time.

In other words, you can configure an average use threshold for CPU or a utilization threshold for CPU, but not both, and an average use threshold for memory or a utilization threshold for memory, but not both.

Finally, you can also configure autoscaling based on an average threshold of received HTTP requests per second.

Configure autoscaling

Tanzu CLI-based steps

You can use Tanzu CLI commands to activate or update the autoscaling configuration for an app.

The configuration options are:

--min and --max
--cpu-average-value and --memory-average-value
--cpu-utilization and --memory-utilization
--http-requests-per-second

The tanzu app autoscale command includes an interactive mode. When you run tanzu app autoscale APP-NAME, the command prompts you to provide a new value for each configurable autoscaling option.

Ensure that your Project and Space are set correctly by running:
```
tanzu project use PROJECT-NAME
tanzu space use SPACE-NAME
```
To activate horizontal autoscaling, run the tanzu app autoscale command with the --max option and at least one autoscaling option. After setting --max, you don’t need to specify it again when updating the autoscaling configuration unless you want to change the maximum number of instances. For example, you can run:
```
tanzu app autoscale APP-NAME --max=10 --cpu-average-value=200m
```
Where APP-NAME is the ContainerApp name of your application.
You can make multiple changes at the same time. For example, to remove an existing autoscaling threshold and add a different one, run:
```
tanzu app autoscale APP-NAME --cpu-average-value- --memory-utilization=90%
```
This example removes the CPU average use threshold that was previously configured and replaces it with a CPU utilization threshold. Remember that you must have at least one threshold configured at all times. The CLI enforces a valid configuration, so you don’t need to worry about making a mistake.

By using these CLI options you can edit, add, and remove the different possible scaling thresholds and the scaling bounds.
View the autoscaling configuration and current metric values by running:
```
tanzu app get APP-NAME
```

Fix horizontal autoscaling errors

Tanzu Platform makes it as easy as possible to avoid issues with the configuration of horizontal autoscaling. However, it is still possible in rare cases to have autoscaling errors, such as when manually modifying resources. The tanzu app list command highlights applications that have autoscaling errors and guides you to the tanzu app get APP-NAME command.

When autoscaling is active for an app, the tanzu app get APP-NAME command shows a section titled Autoscaling Details with a Status subsection. If there are autoscaling errors, the Status subsection shows the details of the errors to help you to fix them. In some cases a scaling error only applies to a subset of instances of an application. In such cases, the tanzu app instance get INSTANCE-ID command shows error messages for that app instance.

Deactivate horizontal autoscaling

Horizontal autoscaling can be deactivated at any time by manually setting the number of instances an application can have.

To deactivate horizontal autoscaling, see Update the app instance count.

Deactivating horizontal autoscaling forces the app to scale up or down to reach the fixed number of instances you have set.

Content feedback and comments

Tanzu Platform SaaS