Monitoring your system metrics using
Automation Config
Automation Config
Automation Config
exposes several system
metrics that can be used for monitoring and diagnostics. These metrics are available in
graphical form on the Automation Config
user
interface dashboard and in machine-readable form using the /metrics
http
endpoint.Going forward,
Automation Config
is no longer included in the Aria
Automation suite of products. The new name of this product is VMware Tanzu Salt and this
product is available as part of the VMware Tanzu Platform suite of products. See Using and Managing Tanzu
Salt for more information.For more information about visualizing reports in the
Automation Config
user interface using the Dashboard, see Dashboard Reports.Machine-readable metrics
Automation Config
exports system metrics in OpenMetrics
text-based format. This format is directly consumable by Prometheus and other monitoring and alerting tools.
Automation Config metrics configuration
Automation Config
metrics configurationConfiguration for system metrics collection
consists of the following settings in the
/etc/raas/raas
configuration
file. Default values are shown.# System metrics settings metrics: enabled: true # If True, enable the collection of system metrics prometheus: false # If True, enable the Prometheus endpoint at /metrics prometheus_username: # Static username for retrieving /metrics prometheus_password: # Static password for retrieving /metrics snapshot_interval: 60 # How often to record snapshot metrics, in seconds max_query_timedelta: 86400 # Maximum timedelta for a single call to get_system_metrics, in seconds keep: 30 # How long to retain metrics data, in days
The following settings control the handling
of machine-readable system metrics:
- To disable metrics collection, setenabled:false. Note that this will also disable theAutomation Configbuilt-in dashboard.
- To enable the export of machine-readable metrics from the/metricshttp endpoint, setprometheus:true. This setting does not affect theAutomation Configbuilt-in dashboard.
- To access the/metricshttp endpoint by http Basic Authentication, set your credentials in plaintext using theprometheus_usernameandprometheus_passwordvariables. To authenticate using encrypted credentials or environmental variables, leave these variables blank. See #GUID-AA83E276-5BD7-44FA-9094-F00F3A94D0C8_SECTION_2443DC87-85FC-4394-8D1B-283CF18EB6DE-en for more information about authentication methods.
The other settings shown above relate to
the
Automation Config
built-in dashboard
and do not affect the collection or reporting of machine-readable system metrics. In
particular, the snapshot_interval
setting determines how often, in
seconds, metrics are recorded for display on the dashboard, and the
keep
setting determines how long, in days, metrics data will be kept
in the database before being trimmed.Although the
/metrics
http
endpoint is the recommended way to gather machine-readable metrics data from
Automation Config
, you can use the API
(RaaS) to retrieve the data presented on the built-in dashboard. The
stats.get_system_metrics()
API call lets you query metrics data by
metric name, source, and date range. The configuration item
max_query_timedelta
limits how much data Automation Config
will return from a single API call.
To get metrics data from a longer time span, you can make multiple API calls with
different start and end dates.Salt Master metrics
configuration
The availability of some metrics depends on
the configuration of the Salt masters connected to
Automation Config
:- Metrics on Salt events and job returns will be accurate only if thesseapireturner is configured on the Salt masters.
- Automation Configwill collect low-level function runtime information from Salt masters that havemaster_stats:trueset in their configuration. This option is disabled by default. See master_stats in the Salt documentation for details.
- Metrics on salt job states will be accurate only if the job completion engine is enabled in the Salt Master Plugin:engines: - jobcompletion: {}This engine is enabled in the Salt Master Plugin default configuration.
Configuring Prometheus to connect to
Automation Config
Automation Config
You can authenticate
Automation Config
with Prometheus using one of these
methods:- To store your credentials in an encrypted bundle onetc/raas/raas.secconf, runraas save_credsfrom the command-line interface (CLI). The CLI then prompts you for your Postgres, Redis, and Prometheus credentials. You can also pass your credentials through the CLI using this syntax, replacing the example text with your credentials:raas save_creds 'postgres={"username": "root", "password": "salt"} redis={"username": "default", "password": "redis123"} prometheus={"username": "metrics", "password": "prometheus"}'. See Securing credentials in your SaltStack Enterprise configuration for more information.
- To store your credentials in environmental variables for use with container images, use the variablesPROMETHEUS_USERNAMEandPROMETHEUS_PASSWORD.
- To store your credentials in plaintext, set theprometheus_usernameandprometheus_passwordvariables in the/etc/raas/raasconfiguration file, as noted in the section above. These credentials are stored only in the/etc/raas/raasconfiguration file, are not associated with anyAutomation Configaccount, and cannot be used to authenticate toAutomation Configother than for accessing the/metricshttp endpoint.
If you use all three authentication
methods,
Automation Config
prioritizes
the encrypted credentials stored in etc/raas/raas.secconf
first,
then the environmental variables, and then the plaintext credentails stored in
/etc/raas/raas
. Setting credentials in plaintext in the
configuration file won't prevent the encrypted credentials or environmental variables
from being used. All three options must be blank (unused) in order to disable the
endpoint.You can enable a Prometheus server to
scrape metrics from
Automation Config
by
adding a scrape_configs
job to the Prometheus configuration (typically
prometheus.yml
) for each API (RaaS) server instance you have. Use
this example file as a guide, replacing the suggested variables with the variables for
your environment:scrape_configs: - job_name: 'sse' metrics_path: '/metrics' scheme: 'http' static_configs: - targets: ['localhost:8080'] basic_auth: username: prometheus password: metrics
For the
username
and
password
values, use the same credentials you stored in either the
encrypted configuration file or in plaintext for Automation Config
.This configuration file is stored
and managed on the Prometheus side, not on the
Automation Config
side.See the Prometheus project
documentation for more information on setting up scrape targets and other
Prometheus configuration topics.
Metric descriptions
The machine-readable metrics that
Automation Config
exports fall into
several categories:Category | Metric Name | Metric Type | Labels | Description |
---|---|---|---|---|
Salt master low-level metrics | salt_event_size_bytes
| Histogram | master_id
| Salt event size, in bytes |
salt_master_cmd_duration_seconds
| Histogram | master_id , cmd | Salt master command duration, in seconds. Reported only if
master_stats is configured on the Salt master. | |
Salt Master Plugin metrics | raas_master_commands_processed
| Counter | master_id
| SSE commands processed |
raas_master_master_grains_pushed
| Counter | master_id
| Salt master grain updates pushed to Automation Config | |
raas_master_minion_keys_pushed
| Counter | master_id
| Minion key states updates pushed to Automation Config | |
raas_master_minion_cached_pushed
| Counter | master_id
| Minion cache updates pushed to Automation Config | |
raas_master_masterfs_pushed
| Counter | master_id
| MasterFS updates pushed to Automation Config | |
raas_master_sseapi_engine_iteration_seconds
| Histogram | master_id
| API (RaaS) engine iteration duration, in seconds | |
Server metrics | redis_commands_executed
| Counter | redis_instance
| Redis commands executed (system cache) |
redis_memory_bytes
| Gauge | redis_instance
| Redis memory usage (system cache) | |
celery_tasks_queued
| Counter | raas_instance , task | Celery tasks queued (background jobs) | |
celery_tasks_executed
| Counter | raas_instance , task | Celery tasks executed (background jobs) | |
celery_queue_length
| Gauge | raas_instance
| Celery queue length (background jobs waiting) | |
raas_rpc_request_duration_seconds
| Histogram | raas_instance
| SSE RPC API call duration, in seconds | |
PostgreSQL metrics | postgres_connections
| Gauge | postgres_instance
| Postgres connections |
postgres_transactions
| Counter | postgres_instance
| Postgres transactions committed | |
postgres_rows_read
| Counter | postgres_instance
| Postgres rows read | |
postgres_rows_inserted
| Counter | postgres_instance
| Postgres rows inserted | |
postgres_rows_updated
| Counter | postgres_instance
| Postgres rows updated | |
postgres_rows_deleted
| Counter | postgres_instance
| Postgres rows deleted | |
System metrics | highstate_minions
| Gauge | None | Number of minions that ran a highstate job |
highstate_minions_changed
| Gauge | None | Number of minions that ran a highstate job resulting in one or more
changes | |
highstate_minions_succeeded
| Gauge | None | Number of minions that ran a highstate job with no failures | |
highstate_minion_duration_seconds
| Gauge | None | Average per-minion duration for a highstate run | |
highstate_states
| Gauge | None | Number of unique states applied in highstate runs | |
highstate_states_changed
| Gauge | None | Number of states applied in highstate runs that resulted in one or more
changes | |
highstate_states_succeeded
| Gauge | None | Number of states applied in highstate runs with no failures | |
sse_jobs_in_progress
| Counter | None | Automation Config jobs in
progress | |
sse_jobs_complete_all_successful
| Counter | None | Automation Config jobs
complete with all successful returns | |
sse_jobs_complete_missing_returns
| Counter | None | Automation Config jobs
complete with one or more missing returns | |
sse_jobs_complete_with_errors
| Counter | None | Automation Config jobs
complete with one or more errors | |
sse_masters
| Gauge | None | Total Salt masters in Automation Config | |
sse_minion_grains_deleted
| Counter | master_id
| Number of minion grains deleted | |
sse_minion_grains_indexing_duration_seconds
| Counter | raas_instance
| Minion grains indexing calculations duration, in seconds | |
sse_minion_grains_saved
| Counter | master_id
| Number of minion grains saved | |
sse_minion_target_match_calcs
| Counter | raas_instace
| Number of minion versus target group matching calculations | |
sse_minion_target_match_duration_seconds
| Counter | raas_instance
| Minion versus target group matching calculations durations, in
seconds | |
sse_minions
| Gauge | None | Total minions in Automation Config | |
sse_minions_present
| Gauge | master_id
| Minions present within the configured time limit
raas_presence_expiration | |
sse_minions_lost
| Gauge | master_id
| Minions not present within the time limit | |
sse_minions_unknown
| Gauge | master_id
| Unknown minions (never present) | |
sse_users_authenticated
| Gauge | None | Users authenticated to Automation Config |