Monitoring your system metrics using
Automation Config

Automation Config
exposes several system metrics that can be used for monitoring and diagnostics. These metrics are available in graphical form on the
Automation Config
user interface dashboard and in machine-readable form using the
/metrics
http endpoint.
Going forward,
Automation Config
is no longer included in the Aria Automation suite of products. The new name of this product is VMware Tanzu Salt and this product is available as part of the VMware Tanzu Platform suite of products. See Using and Managing Tanzu Salt for more information.
For more information about visualizing reports in the
Automation Config
user interface using the Dashboard, see Dashboard Reports.

Machine-readable metrics

Automation Config
exports system metrics in OpenMetrics text-based format. This format is directly consumable by Prometheus and other monitoring and alerting tools.

Automation Config
metrics configuration

Configuration for system metrics collection consists of the following settings in the
/etc/raas/raas
configuration file. Default values are shown.
# System metrics settings metrics: enabled: true # If True, enable the collection of system metrics prometheus: false # If True, enable the Prometheus endpoint at /metrics prometheus_username: # Static username for retrieving /metrics prometheus_password: # Static password for retrieving /metrics snapshot_interval: 60 # How often to record snapshot metrics, in seconds max_query_timedelta: 86400 # Maximum timedelta for a single call to get_system_metrics, in seconds keep: 30 # How long to retain metrics data, in days
The following settings control the handling of machine-readable system metrics:
  • To disable metrics collection, set
    enabled:false
    . Note that this will also disable the
    Automation Config
    built-in dashboard.
  • To enable the export of machine-readable metrics from the
    /metrics
    http endpoint, set
    prometheus:true
    . This setting does not affect the
    Automation Config
    built-in dashboard.
  • To access the
    /metrics
    http endpoint by http Basic Authentication, set your credentials in plaintext using the
    prometheus_username
    and
    prometheus_password
    variables. To authenticate using encrypted credentials or environmental variables, leave these variables blank. See #GUID-AA83E276-5BD7-44FA-9094-F00F3A94D0C8_SECTION_2443DC87-85FC-4394-8D1B-283CF18EB6DE-en for more information about authentication methods.
The other settings shown above relate to the
Automation Config
built-in dashboard and do not affect the collection or reporting of machine-readable system metrics. In particular, the
snapshot_interval
setting determines how often, in seconds, metrics are recorded for display on the dashboard, and the
keep
setting determines how long, in days, metrics data will be kept in the database before being trimmed.
Although the
/metrics
http endpoint is the recommended way to gather machine-readable metrics data from
Automation Config
, you can use the API (RaaS) to retrieve the data presented on the built-in dashboard. The
stats.get_system_metrics()
API call lets you query metrics data by metric name, source, and date range. The configuration item
max_query_timedelta
limits how much data
Automation Config
will return from a single API call. To get metrics data from a longer time span, you can make multiple API calls with different start and end dates.

Salt Master metrics configuration

The availability of some metrics depends on the configuration of the Salt masters connected to
Automation Config
:
  • Metrics on Salt events and job returns will be accurate only if the
    sseapi
    returner is configured on the Salt masters.
  • Automation Config
    will collect low-level function runtime information from Salt masters that have
    master_stats:true
    set in their configuration. This option is disabled by default. See master_stats in the Salt documentation for details.
  • Metrics on salt job states will be accurate only if the job completion engine is enabled in the Salt Master Plugin:
    engines: - jobcompletion: {}
    This engine is enabled in the Salt Master Plugin default configuration.

Configuring Prometheus to connect to
Automation Config

You can authenticate
Automation Config
with Prometheus using one of these methods:
  • To store your credentials in an encrypted bundle on
    etc/raas/raas.secconf
    , run
    raas save_creds
    from the command-line interface (CLI). The CLI then prompts you for your Postgres, Redis, and Prometheus credentials. You can also pass your credentials through the CLI using this syntax, replacing the example text with your credentials:
    raas save_creds 'postgres={"username": "root", "password": "salt"} redis={"username": "default", "password": "redis123"} prometheus={"username": "metrics", "password": "prometheus"}'
    . See Securing credentials in your SaltStack Enterprise configuration for more information.
  • To store your credentials in environmental variables for use with container images, use the variables
    PROMETHEUS_USERNAME
    and
    PROMETHEUS_PASSWORD
    .
  • To store your credentials in plaintext, set the
    prometheus_username
    and
    prometheus_password
    variables in the
    /etc/raas/raas
    configuration file, as noted in the section above. These credentials are stored only in the
    /etc/raas/raas
    configuration file, are not associated with any
    Automation Config
    account, and cannot be used to authenticate to
    Automation Config
    other than for accessing the
    /metrics
    http endpoint.
If you use all three authentication methods,
Automation Config
prioritizes the encrypted credentials stored in
etc/raas/raas.secconf
first, then the environmental variables, and then the plaintext credentails stored in
/etc/raas/raas
. Setting credentials in plaintext in the configuration file won't prevent the encrypted credentials or environmental variables from being used. All three options must be blank (unused) in order to disable the endpoint.
You can enable a Prometheus server to scrape metrics from
Automation Config
by adding a
scrape_configs
job to the Prometheus configuration (typically
prometheus.yml
) for each API (RaaS) server instance you have. Use this example file as a guide, replacing the suggested variables with the variables for your environment:
scrape_configs: - job_name: 'sse' metrics_path: '/metrics' scheme: 'http' static_configs: - targets: ['localhost:8080'] basic_auth: username: prometheus password: metrics
For the
username
and
password
values, use the same credentials you stored in either the encrypted configuration file or in plaintext for
Automation Config
.
This configuration file is stored and managed on the Prometheus side, not on the
Automation Config
side.
See the Prometheus project documentation for more information on setting up scrape targets and other Prometheus configuration topics.

Metric descriptions

The machine-readable metrics that
Automation Config
exports fall into several categories:
Category
Metric Name
Metric Type
Labels
Description
Salt master low-level metrics
salt_event_size_bytes
Histogram
master_id
Salt event size, in bytes
salt_master_cmd_duration_seconds
Histogram
master_id
,
cmd
Salt master command duration, in seconds. Reported only if
master_stats
is configured on the Salt master.
 
 
 
 
 
Salt Master Plugin metrics
raas_master_commands_processed
Counter
master_id
SSE commands processed
raas_master_master_grains_pushed
Counter
master_id
Salt master grain updates pushed to
Automation Config
raas_master_minion_keys_pushed
Counter
master_id
Minion key states updates pushed to
Automation Config
raas_master_minion_cached_pushed
Counter
master_id
Minion cache updates pushed to
Automation Config
raas_master_masterfs_pushed
Counter
master_id
MasterFS updates pushed to
Automation Config
raas_master_sseapi_engine_iteration_seconds
Histogram
master_id
API (RaaS) engine iteration duration, in seconds
 
 
 
 
 
Server metrics
redis_commands_executed
Counter
redis_instance
Redis commands executed (system cache)
redis_memory_bytes
Gauge
redis_instance
Redis memory usage (system cache)
celery_tasks_queued
Counter
raas_instance
,
task
Celery tasks queued (background jobs)
celery_tasks_executed
Counter
raas_instance
,
task
Celery tasks executed (background jobs)
celery_queue_length
Gauge
raas_instance
Celery queue length (background jobs waiting)
raas_rpc_request_duration_seconds
Histogram
raas_instance
SSE RPC API call duration, in seconds
 
 
 
 
 
PostgreSQL metrics
postgres_connections
Gauge
postgres_instance
Postgres connections
postgres_transactions
Counter
postgres_instance
Postgres transactions committed
postgres_rows_read
Counter
postgres_instance
Postgres rows read
postgres_rows_inserted
Counter
postgres_instance
Postgres rows inserted
postgres_rows_updated
Counter
postgres_instance
Postgres rows updated
postgres_rows_deleted
Counter
postgres_instance
Postgres rows deleted
 
 
 
 
 
System metrics
highstate_minions
Gauge
None
Number of minions that ran a highstate job
highstate_minions_changed
Gauge
None
Number of minions that ran a highstate job resulting in one or more changes
highstate_minions_succeeded
Gauge
None
Number of minions that ran a highstate job with no failures
highstate_minion_duration_seconds
Gauge
None
Average per-minion duration for a highstate run
highstate_states
Gauge
None
Number of unique states applied in highstate runs
highstate_states_changed
Gauge
None
Number of states applied in highstate runs that resulted in one or more changes
highstate_states_succeeded
Gauge
None
Number of states applied in highstate runs with no failures
sse_jobs_in_progress
Counter
None
Automation Config
jobs in progress
sse_jobs_complete_all_successful
Counter
None
Automation Config
jobs complete with all successful returns
sse_jobs_complete_missing_returns
Counter
None
Automation Config
jobs complete with one or more missing returns
sse_jobs_complete_with_errors
Counter
None
Automation Config
jobs complete with one or more errors
sse_masters
Gauge
None
Total Salt masters in
Automation Config
sse_minion_grains_deleted
Counter
master_id
Number of minion grains deleted
sse_minion_grains_indexing_duration_seconds
Counter
raas_instance
Minion grains indexing calculations duration, in seconds
sse_minion_grains_saved
Counter
master_id
Number of minion grains saved
sse_minion_target_match_calcs
Counter
raas_instace
Number of minion versus target group matching calculations
sse_minion_target_match_duration_seconds
Counter
raas_instance
Minion versus target group matching calculations durations, in seconds
sse_minions
Gauge
None
Total minions in
Automation Config
sse_minions_present
Gauge
master_id
Minions present within the configured time limit
raas_presence_expiration
sse_minions_lost
Gauge
master_id
Minions not present within the time limit
sse_minions_unknown
Gauge
master_id
Unknown minions (never present)
sse_users_authenticated
Gauge
None
Users authenticated to
Automation Config