Healthwatch for VMware Tanzu 2.3

Healthwatch Metrics

Last Updated January 22, 2025

This topic describes the metrics that the Healthwatch Exporter and the Healthwatch Exporter for VMware Tanzu Kubernetes Grid™ Integrated Edition (TKGI) generate.

Overview of Healthwatch Metrics

Healthwatch Exporter for Tanzu Platform for Cloud Foundry and Healthwatch Exporter for TKGI deploy metric exporter VMs to generate component metrics and Service Level Indicators (SLIs) related to the health of your Tanzu Platform for CF and TKGI deployments:

Each metric exporter VM exposes these metrics and SLIs on a Prometheus exposition endpoint, /metrics.

The Prometheus instance that exists within your metrics monitoring system then scrapes each /metrics endpoints on the metric exporter VMs and imports those metrics into your monitoring system. You can configure the frequency at which the Prometheus instance scrapes the /metrics endpoints in the Prometheus pane of the Healthwatch for VMware Tanzu tile. To configure the scrape interval for the Prometheus instance, see Configure Prometheus in Configuring Healthwatch.

The name of each metric is in PromQL format. For more information, see the Prometheus documentation.


In a VMware Tanzu Operations Manager™ foundation, the BOSH Director manages the VMs that each tile deploys. If the BOSH Director fails or is not responsive, the VMs that the BOSH Director manages also fail.

Healthwatch Exporter for Tanzu Platform for Cloud Foundry and Healthwatch Exporter for TKGI deploy two VMs that continuously test the functionality of the BOSH Director: the BOSH health metric exporter VM and the BOSH deployment metric exporter:

BOSH Health Metric Exporter VM

The BOSH health metric exporter VM, bosh-health-exporter, creates a BOSH deployment called bosh-health every ten minutes. This BOSH deployment deploys another VM, bosh-health-check, that runs a suite of SLI tests to validate the functionality of the BOSH Director. After the SLI tests are complete, the BOSH health metric exporter VM collects the metrics from the bosh-health-check VM, then deletes the bosh-health deployment and the bosh-health-check VM.

The following table describes each metric the BOSH health metric exporter VM generates:

Metric Description
bosh_sli_duration_seconds_bucket{exported_job="bosh-health-exporter"} The number of seconds the BOSH health SLI test suite takes to run, grouped by how many ran in less than a certain amount of time. This metric is also called a bucket of BOSH health SLI test suite duration metrics.
bosh_sli_duration_seconds_count{exported_job="bosh-health-exporter"} The total number of duration metrics across all BOSH health SLI test suite duration metric buckets.
bosh_sli_duration_seconds_sum{exported_job="bosh-health-exporter"} The total value of the duration metrics across all BOSH health SLI test suite duration metric buckets.
bosh_sli_exporter_status{exported_job="bosh-health-exporter"} The health status of the BOSH health metric exporter VM. A value of 1 indicates that the BOSH health metric exporter VM is running and healthy.
bosh_sli_failures_total{exported_job="bosh-health-exporter"} The total number of times the BOSH health SLI test suite fails. A failed test suite is one in which any number of tests within the test suite fail.
bosh_sli_run_duration_seconds{exported_job="bosh-health-exporter"} The number of seconds a single BOSH health SLI test suite takes to run.
bosh_sli_runs_total{exported_job="bosh-health-exporter"} The total number of times the BOSH health SLI test suite runs. To see the failure rate of bosh_sli_runs_total{exported_job="bosh-health-exporter"}, divide the value of bosh_sli_failures_total{exported_job="bosh-health-exporter"} by the value of bosh_sli_runs_total{exported_job="bosh-health-exporter"}.
bosh_sli_task_duration_seconds_bucket{exported_job="bosh-health-exporter"} The number of seconds it takes a task within the BOSH health SLI test suite to run, grouped by how many ran in less than a certain amount of time. This metric is also called a bucket of task duration metrics.
bosh_sli_task_duration_seconds_count{exported_job="bosh-health-exporter"} The total number of duration metrics across all task duration metric buckets.
bosh_sli_task_duration_seconds_sum{exported_job="bosh-health-exporter"} The total value of the duration metrics across all task duration metric buckets.
bosh_sli_task_run_duration_seconds{exported_job="bosh-health-exporter",task="delete"} The number of seconds it takes the bosh delete-deployment command test to run.
bosh_sli_task_run_duration_seconds{exported_job="bosh-health-exporter",task="deploy"} The number of seconds it takes the bosh deploy command test to run.
bosh_sli_task_run_duration_seconds{exported_job="bosh-health-exporter",task="deployments"} The number of seconds it takes the bosh deployments command test to run.
bosh_sli_task_runs_total{exported_job="bosh-health-exporter"} The total number of times a task runs. To see the failure rate of bosh_sli_task_runs_total{exported_job="bosh-health-exporter"}, divide the value of bosh_sli_task_failures{exported_job="bosh-health-exporter"} by the value of bosh_sli_task_runs{exported_job="bosh-health-exporter"}.
bosh_sli_task_failures_total{exported_job="bosh-health-exporter",task="delete"} The total number of times the bosh delete-deployment command fails.
bosh_sli_task_failures_total{exported_job="bosh-health-exporter",task="deploy"} The total number of times the bosh deploy command fails.
bosh_sli_task_failures_total{exported_job="bosh-health-exporter",task="deployments"} The total number of times the bosh deployments command fails.

BOSH Deployment Metric Exporter VM

The BOSH deployment metric exporter VM, bosh-deployments-exporter, checks every 30 seconds whether any BOSH deployments other than the bosh-health deployment created by the BOSH health metric exporter VM are running.

The following table describes each metric the BOSH deployment metric exporter VM generates:

Metric Description
bosh_deployments_status Whether any BOSH deployments other than bosh-health are running. A value of 0 indicates that no other BOSH deployments are running on the BOSH Director. A value of 1 indicates that other BOSH deployments are running on the BOSH Director.
bosh_sli_duration_seconds_bucket{exported_job="bosh-deployments-exporter"} The number of seconds the BOSH deployment check takes to run, grouped by how many ran in less than a certain amount of time. This metric is also called a bucket of BOSH deployment check duration metrics.
bosh_sli_duration_seconds_count{exported_job="bosh-deployments-exporter"} The total number of duration metrics across all BOSH deployment check duration metric buckets.
bosh_sli_duration_seconds_sum{exported_job="bosh-deployments-exporter"} The total value of the duration metrics across all BOSH deployment check duration metric buckets.
bosh_sli_exporter_status{exported_job="bosh-deployments-exporter"} The health status of the BOSH deployment metric exporter VM. A value of 1 indicates that the BOSH deployment metric exporter VM is running and healthy.
bosh_sli_failures_total{exported_job="bosh-deployments-exporter"} The total number of times the BOSH deployment check fails.
bosh_sli_run_duration_seconds{exported_job="bosh-deployments-exporter"} The number of seconds a single BOSH deployment check takes to run.
bosh_sli_runs_total{exported_job="bosh-deployments-exporter"} The total number of times the BOSH deployment check runs. To see the failure rate of bosh_sli_runs_total{exported_job="bosh-deployments-exporter"}, divide the value of bosh_sli_failures_total{exported_job="bosh-deployments-exporter"} by the value of bosh_sli_runs_total{exported_job="bosh-deployments-exporter"}.
bosh_sli_task_duration_seconds_bucket{exported_job="bosh-deployments-exporter"} The number of seconds it takes a task within the BOSH deployment check to run, grouped how many ran in less than a certain amount of time. This metric is also called a bucket of task duration metrics.
bosh_sli_task_duration_seconds_count{exported_job="bosh-deployments-exporter"} The total number of duration metrics across all task duration metric buckets.
bosh_sli_task_duration_seconds_sum{exported_job="bosh-deployments-exporter"} The total value of the duration metrics across all task duration metric buckets.
bosh_sli_task_run_duration_seconds{exported_job="bosh-deployments-exporter",task="tasks"} The number of seconds it takes the bosh tasks command test to run.
bosh_sli_task_runs_total{exported_job="bosh-deployments-exporter"} The total number of times a task runs. To see the failure rate of bosh_sli_task_runs_total{exported_job="bosh-deployments-exporter"}, divide the value of bosh_sli_task_failures_total{exported_job="bosh-deployments-exporter"} by the value of bosh_sli_task_runs_total{exported_job="bosh-deployments-exporter"}.
bosh_sli_task_failures_total{exported_job="bosh-deployments-exporter",task="tasks"} The total number of times the bosh tasks command fails.

Platform Metrics

Healthwatch Exporter for Tanzu Platform for Cloud Foundry and Healthwatch Exporter for TKGI deploy VMs that generate metrics regarding the health of several Tanzu Operations Manager and runtime components.

You can use the following Platform Metrics metrics to calculate percent availability and error budgets:

Tanzu Platform for Cloud Foundry SLI Exporter VM

Developers create and manage apps on Tanzu Platform for Cloud Foundry using the Cloud Foundry Command Line Interface (cf CLI). Healthwatch Exporter for Tanzu Platform for Cloud Foundry deploys the Tanzu Platform for Cloud Foundry SLI exporter VM, pas-sli-exporter, which continuously tests the functionality of the cf CLI.

The following table describes each metric the Tanzu Platform for Cloud Foundry SLI exporter VM generates:

Metric Description
tas_sli_duration_seconds_bucket The number of seconds the Tanzu Platform for CF SLI test suite takes to run, grouped by how many ran in less than a certain amount of time. This metric is also called a bucket of Tanzu Platform for CF SLI test suite duration metrics.
tas_sli_duration_seconds_count The total number of duration metrics across all Tanzu Platform for CF SLI test suite duration metric buckets.
tas_sli_duration_seconds_sum The total value of the duration metrics across all Tanzu Platform for CF SLI test suite duration metric buckets.
tas_sli_exporter_status The health status of the Tanzu Platform for CF SLI exporter VM. A value of 1 indicates that the Tanzu Platform for CF SLI exporter VM is running and healthy.
tas_sli_failures_total The total number of times the Tanzu Platform for CF SLI test suite fails.
tas_sli_run_duration_seconds The number of seconds the Tanzu Platform for CF SLI test suite takes to run.
tas_sli_runs_total The total number of times the Tanzu Platform for CF SLI test suite runs. To see the failure rate of tas_sli_runs_total, divide the value of tas_sli_failures_total by the value of tas_sli_runs_total.
tas_sli_task_duration_seconds_bucket The number of seconds it takes a task within the Tanzu Platform for CF SLI test suite to run, grouped by how many ran in less than a certain amount of time. This metric is also called a bucket of task duration metrics.
tas_sli_task_duration_seconds_count The total number of duration metrics across all task duration metric buckets.
tas_sli_task_duration_seconds_sum The total value of the duration metrics across all task duration metric buckets.
tas_sli_task_run_duration_seconds{task="delete"} The number of seconds it takes the cf delete command test to run.
tas_sli_task_run_duration_seconds{task="login"} The number of seconds it takes the cf login command test to run.
tas_sli_task_run_duration_seconds{task="logs"} The number of seconds it takes the cf logs command test to run.
tas_sli_task_run_duration_seconds{task="push"} The number of seconds it takes the cf push command test to run.
tas_sli_task_run_duration_seconds{task="setEnv"} The number of seconds it takes the cf set-env command test to run.
tas_sli_task_run_duration_seconds{task="start"} The number of seconds it takes the cf start command test to run.
tas_sli_task_run_duration_seconds{task="stop"} The number of seconds it takes the cf stop command test to run.
tas_sli_task_runs_total The total number of times a task runs. To see the failure rate of tas_sli_task_runs_total, divide the value of tas_sli_task_failures by the value of tas_sli_task_runs.
tas_sli_task_failures_total{task="delete"} The total number of times the cf delete command fails.
tas_sli_task_failures_total{task="login"} The total number of times the cf login command fails.
tas_sli_task_failures_total{task="logs"} The total number of times the cf logs command fails.
tas_sli_task_failures_total{task="push"} The total number of times the cf push command fails.
tas_sli_task_failures_total{task="setEnv"} The total number of times the cf set-env command fails.
tas_sli_task_failures_total{task="start"} The total number of times the cf start command fails.
tas_sli_task_failures_total{task="stop"} The total number of times the cf stop command fails.

TKGI SLI Exporter VM

Operators create and manage Kubernetes clusters using the TKGI Command Line Interface (TKGI CLI). Healthwatch Exporter for TKGI deploys the TKGI SLI exporter VM, pks-sli-exporter, which continuously tests the functionality of the TKGI CLI.

The following table describes each metric the TKGI SLI exporter VM generates:

Metric Description
tkgi_sli_duration_seconds_bucket The number of seconds the TKGI SLI test suite takes to run, grouped by how many ran in less than a certain amount of time. This metric is also called a bucket of TKGI SLI test suite duration metrics.
tkgi_sli_duration_seconds_count The total number of duration metrics across all TKGI SLI test suite duration metric buckets.
tkgi_sli_duration_seconds_sum The total value of the duration metrics across all TKGI SLI test suite duration metric buckets.
tkgi_sli_exporter_status The health status of the TKGI SLI exporter VM. A value of 1 indicates that the TKGI SLI exporter VM is running and healthy.
tkgi_sli_failures_total The total number of times the TKGI SLI test suite fails.
tkgi_sli_run_duration_seconds The number of seconds the TKGI SLI test suite takes to run.
tkgi_sli_runs_total The total number of times the TKGI SLI test suite runs. To see the failure rate of tkgi_sli_runs_total, divide the value of tkgi_sli_failures_total by the value of tkgi_sli_runs_total.
tkgi_sli_task_duration_seconds_bucket The number of seconds it takes a task with the TKGI SLI test suite to run, grouped by duration. This metric is also called a bucket of task duration metrics.
tkgi_sli_task_duration_seconds_count The total number of duration metrics across all task duration metric buckets.
tkgi_sli_task_duration_seconds_sum The total value of the duration metrics across all task duration metric buckets.
tkgi_sli_task_run_duration_seconds{task="clusters"} The number of seconds it takes the tkgi clusters command test to run.
tkgi_sli_task_run_duration_seconds{task="get-credentials"} The number of seconds it takes the tkgi get-credentials command test to run.
tkgi_sli_task_run_duration_seconds{task="login"} The number of seconds it takes the tkgi login command test to run.
tkgi_sli_task_run_duration_seconds{task="plans"} The number of seconds it takes the tkgi plans command test to run.
tkgi_sli_task_runs_total The total number of times a task runs. To see the failure rate of tkgi_sli_task_runs_total, divide the value of tkgi_sli_task_failures by the value of tkgi_sli_task_runs.
tkgi_sli_task_failures_total{task="clusters"} The total number of times the tkgi clusters command fails.
tkgi_sli_task_failures_total{task="get-credentials"} The total number of times the tkgi get-credentials command fails.
tkgi_sli_task_failures_total{task="login"} The total number of times the tkgi login command fails.
tkgi_sli_task_failures_total{task="plans"} The total number of times the tkgi plans command fails.

Certificate Expiration Metric Exporter VM

Healthwatch Exporter for Tanzu Platform for Cloud Foundry and Healthwatch Exporter for TKGI deploy the certificate expiration metric exporter VM, cert-expiration-exporter, which collects metrics that show when Tanzu Operations Manager certificates are due to expire. For more information, see Monitoring Certificate Expiration.

The following table describes the metric the certificate expiration metric exporter VM generates:

Metric Description
ssl_certificate_expiry_seconds{exported_instance=~"CERTIFICATE"} The time in seconds until a certificate expires, where CERTIFICATE is the name of the certificate.

Prometheus VM

In the Canary URLs pane of the Healthwatch tile, you configure target URLs to which the Blackbox Exporters in the Prometheus instance sends canary tests. Testing a canary target URL allows you to gauge the overall health and accessibility of an app, runtime, or deployment.

On the Prometheus VM, tsdb, the Blackbox Exporter job, blackbox-exporter, generates canary test metrics.

The following table describes each metric the Blackbox Exporters in the Prometheus instance generates:

Metric Description
probe_dns_additional_rrs The number of entries in the additional resource record list of the DNS server for the canary target URL.
probe_dns_answer_rrs The number of entries in the answer resource record list of the DNS server for the canary target URL.
probe_dns_authority_rrs The number of entries in the authority resource record list of the DNS server for the canary target URL.
probe_dns_duration_seconds The duration of the canary test DNS request by phase.
probe_dns_lookup_time_seconds The number of seconds the canary test DNS lookup takes to complete.
probe_dns_serial The serial number of the DNS zone for your canary target URL.
probe_duration_seconds The number of seconds the canary test takes to complete.
probe_failed_due_to_regex Whether the canary test failed due to a regex error in the canary test configuration. A value of 0 indicates that the canary test did not fail due to a regex error. A value of 1 indicates that the canary test did fail due to a regex error.
probe_http_content_length The length of the HTTP content response from the canary target URL.
probe_http_duration_seconds The duration of the canary test HTTP request by phase, summed over all redirects.
probe_http_last_modified_timestamp_seconds The last-modified timestamp for the HTTP response header in Unix time.
probe_http_redirects The number of redirects the canary test goes through to reach the canary target URL.
probe_http_ssl Whether the canary test used TLS for the final redirect. A value of 0 indicates that the canary test did not use TLS for the final redirect. A value of 1 indicates that the canary test did use TLS for the final redirect.
probe_http_status_code The status code of the HTTP response from the canary target URL.
probe_http_uncompressed_body_length The length of the uncompressed response body.
probe_http_version The version of HTTP the canary test HTTP response uses.
probe_icmp_duration_seconds The duration of the canary test ICMP request by phase.
probe_icmp_reply_hop_limit If the canary test protocol is IPv6: The replied packet hop limit.
If the canary test protocol is IPv4: The time-to-live count.
probe_ip_addr_hash The hash of the IP address of the canary target URL.
probe_ip_protocol Whether the IP protocol of the canary test is IPv4 or IPv6.
probe_ssl_earliest_cert_expiry The earliest TLS certificate expiration for the canary test URL in Unix time.
probe_ssl_last_chain_expiry_timestamp_seconds The last TLS chain expiration for the canary test URL in Unix time.
probe_ssl_last_chain_info Information about the TLS leaf certificate for the canary test URL.
probe_success Whether the canary test succeeded or failed. A value of 0 indicates that the canary test failed. A value of 1 indicates that the canary test succeeded.
probe_tls_version_info The TLS version the canary test uses, or NaN when unknown.
bosh_deployments_status Whether any BOSH deployments other than bosh-health are running. A value of 0 indicates that no other BOSH deployments are running on the BOSH Director. A value of 1 indicates that other BOSH deployments are running on the BOSH Director.

SVM Forwarder VM - Platform Metrics

Super value metrics (SVMs) are composite metrics that the Prometheus instance in Healthwatch v2.2+ generates. The SVM Forwarder VM, svm-forwarder, then sends these metrics back into the Loggregator Firehose so third-party nozzles can send them to external destinations, such as a remote server or external aggregation service.

The SVM Forwarder VM sends SVMs related to platform metrics and Healthwatch component metrics to the Loggregator Firehose. For more information about SVMs related to Healthwatch component metrics, see SVM Forwarder VM - Healthwatch Component Metrics below.

The following table describes each platform metric the SVM Forwarder VM sends to the Loggregator Firehose:

Metric Description
Diego_AppsDomainSynced Whether Cloud Controller and Diego are in sync. A value of 0 indicates that Cloud Controller and Diego are not in sync. A value of 1 indicates that Cloud Controller and Diego are in sync.
Diego_AvailableFreeChunksDisk The available free chunks of disk across all Diego Cells.
Diego_AvailableFreeChunks The available free chunks of memory across all Diego Cells.
Diego_LRPsAdded_1H The rate of change in running app instances over a one-hour period.
Diego_TotalAvailableDiskCapacity_5M The remaining Diego Cell disk available across all Diego Cells over a five-minute period.
Diego_TotalAvailableMemoryCapacity_5M The remaining Diego Cell memory available across all Diego Cells over a five-minute period.
Diego_TotalPercentageAvailableContainerCapacity_5M The percentage of total available container capacity across all Diego Cells over a five-minute period.
Diego_TotalPercentageAvailableDiskCapacity_5M The percentage of total available disk across all Diego Cells over a five-minute period.
Diego_TotalPercentageAvailableMemoryCapacity_5M The percentage of total available memory across all Diego Cells over a five-minute period.
Doppler_MessagesAverage_1M The average Doppler message rate over a one-minute period.
Firehose_LossRate_1H The log transport loss rate over a one-hour period.
Firehose_LossRate_1M The log transport loss rate over a one-minute period.
SyslogAgent_LossRate_1M The Syslog Agent loss rate over a one-minute period.
SyslogDrain_RLP_LossRate_1M The Reverse Log Proxy loss rate over a one-minute period.
bosh_deployment Represents bosh_deployments_status from the BOSH deployment metric exporter VM, which indicates whether any BOSH deployments other than the one created by the BOSH health metric exporter VM are running. A value of 0 indicates that no other BOSH deployments are running on the BOSH Director. A value of 1 indicates that other BOSH deployments are running on the BOSH Director.
health_check_bosh_director_success Whether the BOSH SLI test suite that the BOSH health metric exporter VM ran succeeded or failed. A value of 0 indicates that the BOSH SLI test suite failed. A value of 1 indicates that the BOSH SLI test suite succeeded.
health_check_CanaryApp_available Whether the canary app is available. A value of 0 indicates that the canary app is unavailable. A value of 1 indicates that the canary app is available.
health_check_CanaryApp_responseTime The response time of the canary app in seconds.
health_check_cliCommand_delete Whether the cf delete command succeeds or fails. A value of 0 indicates that the cf delete command failed. A value of 1 indicates that the cf delete command succeeded.
health_check_cliCommand_login Whether the cf login command succeeds or fails. A value of 0 indicates that the cf login command failed. A value of 1 indicates that the cf login command succeeded.
health_check_cliCommand_logs Whether the cf logs command succeeds or fails. A value of 0 indicates that the cf logs command failed. A value of 1 indicates that the cf logs command succeeded.
health_check_cliCommand_probe_count The number of cf CLI health checks that Healthwatch completes in the measured time period.
health_check_cliCommand_pushTime The amount of time it takes the cf CLI to push an app.
health_check_cliCommand_push Whether the cf push command succeeds or fails. A value of 0 indicates that the cf push command failed. A value of 1 indicates that the cf push command succeeded.
health_check_cliCommand_start Whether the cf start command succeeds or fails. A value of 0 indicates that the cf start command failed. A value of 1 indicates that the cf start command succeeded.
health_check_cliCommand_stop Whether the cf stop command succeeds or fails. A value of 0 indicates that the cf stop command failed. A value of 1 indicates that the cf stop command succeeded.
health_check_cliCommand_success The overall success of the SLI tests that Healthwatch runs on the cf CLI.
uaa_throughput_rate The lifetime number of requests completed by the UAA VM, emitted per UAA instance in Tanzu Platform for Cloud Foundry. This number includes health checks.

Healthwatch Component Metrics

The following metrics exist for the purpose of monitoring the Healthwatch components:

TKGI Metric Exporter VM

Healthwatch Exporter for TKGI deploys a TKGI metric exporter VM, pks-exporter, that collects BOSH system metrics for TKGI and converts them to a Prometheus exposition format.

The following table describes each metric the TKGI metric exporter VM collects and converts:

Metric Description
healthwatch_boshExporter_ingressLatency_seconds_bucket The number of seconds the TKGI metric exporter VM takes to process a batch of Loggregator envelopes, grouped by latency. This metric is also called a bucket of ingress latency metrics.
healthwatch_boshExporter_ingressLatency_seconds_count The total number of metrics across all ingress latency metric buckets.
healthwatch_boshExporter_ingressLatency_seconds_sum The total value of the metrics across all ingress latency metric buckets.
healthwatch_boshExporter_ingress_envelopes The number of Loggregator envelopes the observability metrics agent on the TKGI metric exporter VM receives.
healthwatch_boshExporter_metricConversion_seconds_bucket The number of seconds the TKGI metric exporter VM takes to convert a BOSH metric to a Prometheus gauge, grouped by how many ran in less than a certain amount of time. This metric is also called a bucket of gauge conversion duration metrics.
healthwatch_boshExporter_metricConversion_seconds_count The total number of metrics across all gauge conversion duration metric buckets.
healthwatch_boshExporter_metricConversion_seconds_sum The total value of the metrics across all gauge conversion duration metric buckets.
healthwatch_boshExporter_status The health status of the TKGI metric exporter VM. A value of 0 indicates that the TKGI metric exporter VM is not responding. A value of 1 indicates that the TKGI metric exporter VM is running and healthy.

Healthwatch Exporter for Tanzu Platform for Cloud Foundry Metric Exporter VMs

Healthwatch Exporter deploys metric exporter VMs that collect metrics from the Loggregator Firehose and convert them into a Prometheus exposition format.

Each of the following metric exporter VMs collects and converts a single metric type from the Loggregator Firehose. The names of the metric exporter VMs correspond to the types of metrics they collect and convert:

The counter metric exporter VM, pas-exporter-counter, collects counter metrics from the Loggregator Firehose and converts them into a Prometheus exposition format.

The following table describes each metric the counter metric exporter VM collects and converts:

Metric Description
healthwatch_pasExporter_counterConversion_seconds The number of seconds the counter metric exporter VM takes to convert a Loggregator counter envelope to a Prometheus counter.
healthwatch_pasExporter_ingressLatency_seconds The number of seconds the counter metric exporter VM takes to process a batch of Loggregator counter envelopes.
healthwatch_pasExporter_ingress_envelopes The number of Loggregator counter envelopes the observability metrics agent on the counter metric exporter VM receives.
healthwatch_pasExporter_status The health status of the counter metric exporter VM. A value of 0 indicates that the counter metric exporter VM is not responding. A value of 1 indicates that the counter metric exporter VM is running and healthy.

The gauge metric exporter VM, pas-exporter-gauge, collects gauge metrics from the Loggregator Firehose and converts them into a Prometheus exposition format.

The following table describes each metric the gauge metric exporter VM collects and converts:

Metric Description
healthwatch_pasExporter_gaugeConversion_seconds The number of seconds the gauge metric exporter VM takes to convert a Loggregator gauge envelope to a Prometheus gauge.
healthwatch_pasExporter_ingressLatency_seconds The number of seconds the gauge metric exporter VM takes to process a batch of Loggregator gauge envelopes.
healthwatch_pasExporter_ingress_envelopes The number of Loggregator gauge envelopes the observability metrics agent on the gauge metric exporter VM receives.
healthwatch_pasExporter_status The health status of the gauge metric exporter VM. A value of 0 indicates that the gauge metric exporter VM is not responding. A value of 1 indicates that the gauge metric exporter VM is running and healthy.

Prometheus Exposition Endpoint

Most of the metric exporter VMs generate metrics concerning how the Prometheus instance interacts with the /metrics endpoint on each metric exporter VM.

The following table describes each metric the /metrics endpoint on each metric exporter VM generates:

Metric Description
healthwatch_prometheusExpositionLatency_seconds The number of seconds the metric exporter VM takes to render a Prometheus scrape page.
healthwatch_prometheusExposition_histogramMapConversion The number of seconds the metric exporter VM takes to convert histogram collection to a map.
healthwatch_prometheusExposition_metricMapConversion The number of seconds the metric exporter VM takes to convert metrics collection to a map.
healthwatch_prometheusExposition_metricSorting The number of seconds the metric exporter VM takes to sort metrics when rendering a Prometheus scrape page.

SVM Forwarder VM - Healthwatch Component Metrics

SVMs are composite metrics that the Prometheus instance in Healthwatch v2.2+ generates. The SVM Forwarder VM, svm-forwarder, then sends these metrics back into the Loggregator Firehose so third-party nozzles can send them to external destinations, such as a remote server or external aggregation service.

The SVM Forwarder VM sends SVMs related to platform metrics and Healthwatch component metrics to the Loggregator Firehose. For more information about SVMs related to platform metrics, see SVM Forwarder VM - Platform Metrics above.

The following table describes each Healthwatch component metric the SVM Forwarder VM sends to the Loggregator Firehose:

Metric Description
failed_scrapes_total The total number of failed scrapes for the target source_id.
last_total_attempted_scrapes The total number of attempted scrapes during the most recent round of scraping.
last_total_failed_scrapes The total number of failed scrapes during the most recent round of scraping.
last_total_scrape_duration The time in milliseconds to scrape all targets during the most recent round of scraping.
scrape_targets_total The total number of scrape targets identified from the configuration file for the Prometheus VM.