Troubleshoot Problems with a Host System

Use the Troubleshooting tabs to identify the root cause of problems that the system does not resolve by alert recommendations or simple analysis.

To troubleshoot the symptoms of the capacity problems that are occurring on the cluster and host system, and determine when those problems occurred, use the Troubleshooting tabs to investigate the memory problem.

From the left menu, click

Environment

, and then click

Object Browser

vSphere Hosts and Clusters

and select the object. For example, USA-Cluster.

Click the

Alerts

tab and review the symptoms.

The

Symptoms

tab displays the symptoms that triggered on the selected cluster. You notice that several critical symptoms exist.

Cluster Compute Resource Time Remaining with committed projects is critically low

Cluster Compute Resource Time Remaining is critically low

Capacity remaining is critically low

Investigate the critical symptoms.

Point to each critical symptom to identify the metric used.

To view only the symptoms that affect the cluster, enter

cluster

in the quick filter text box.

When you point to

Cluster Compute Resource Time Remaining is critically low

, the metric

Capacity|Time Remaining

appears. You notice that its value is less than or equal to zero, which caused the capacity symptom to trigger and generate an alert on the USA-Cluster.

Click the

Events > Timeline

tab to review the triggered symptoms, alerts, and events that occurred on the USA-Cluster over time, and identify when the problems occurred.

Click the calendar and select

Last 7 Days

as the range.

Several events appear in red.

Point to each event to view the details.

To display the events that occurred on the cluster's data center, click

View From

, and select

Datacenter

Warning events for the data center appear in yellow.

Point to the warning events.

You notice that a hard threshold violation occurred on the data center late in the evening. The hard threshold violation shows that the Badge|Workload metric value was under the acceptable value, and that the violation triggered.

To view the affected child objects, click

View From

and select

Host System

Click the

Events

tab to examine the changes that occurred on the USA-Cluster, and determine whether a change occurred that contributed to the root cause of the alert or other problems with the cluster.

Review the graph.

By reviewing the graph, you can determine whether a reoccurring event has caused the errors. Each event indicates that the guest file system is out of disk space. The affected objects appear in the pane following the graph.

Click each red triangle to identify the affected object and highlight it in that pane.

Click the

Capacity

tab to evaluate details of capacity and time remaining.

Click the

All Metrics

tab to evaluate the objects in their context in the environment topology to help identify the possible cause of a problem.

In the top view, select

USA-Cluster

In the metrics pane, expand

All Metrics > Capacity Analyltics Generated

and double-click

Capacity Remaining (%)

The Capacity Remaining (%) calculation appears on the right pane.

In the metrics pane, expand

All Metrics > Badge

and double-click

Workload (%)

. The Workload (%) calculation appears on the right pane.

On the toolbar, click

Date Controls

and select

Last 7 Days

The metric chart indicates that the capacity for the cluster remained at a steady level for the past week, but that the Badge|Workload (%) calculation displays workload extremes.

You have analyzed the symptoms, timeline, events, and metrics related to the problems on your cluster. Through your analysis, you have determined that the heavy workload on the cluster has caused the cluster to start running out of capacity.

Examine the Details views and heat maps to interpret the properties, metrics, and alerts. Also, look for trends and spikes that occur in the resources for your objects, the distributions of resources across your objects, and data maps. You can examine the use of various object types across your objects.

Examine the Details views and heat maps to interpret the properties, metrics, and alerts. Also, to look for trends and spikes that occur in the resources for your objects, the distributions of resources across your objects, and data maps. You can examine the use of various object types across your objects. See Examine the Environment Details.

User Scenario: You See Problems as You Monitor the State of Your Objects

Examine the Environment Details

Content feedback and comments

VMware Aria Operations 8.17.1

Troubleshoot Problems with a Host System