Troubleshoot Problems
with a Host System
Use the
Troubleshooting tabs to identify the root cause of problems that the system
does not resolve by alert recommendations or simple analysis.
To troubleshoot the symptoms
of the capacity problems that are occurring on the cluster and host system, and
determine when those problems occurred, use the Troubleshooting tabs to
investigate the memory problem.
- From the left menu, clickEnvironment, and then clickObject Browser>vSphere Hosts and Clustersand select the object. For example, USA-Cluster.
- Click theAlertstab and review the symptoms.TheSymptomstab displays the symptoms that triggered on the selected cluster. You notice that several critical symptoms exist.
- Cluster Compute Resource Time Remaining with committed projects is critically low
- Cluster Compute Resource Time Remaining is critically low
- Capacity remaining is critically low
- Investigate the critical symptoms.
- Point to each critical symptom to identify the metric used.
- To view only the symptoms that affect the cluster, enterclusterin the quick filter text box.When you point toCluster Compute Resource Time Remaining is critically low, the metricCapacity|Time Remainingappears. You notice that its value is less than or equal to zero, which caused the capacity symptom to trigger and generate an alert on the USA-Cluster.
- Click theEvents > Timelinetab to review the triggered symptoms, alerts, and events that occurred on the USA-Cluster over time, and identify when the problems occurred.
- Click the calendar and selectLast 7 Daysas the range.Several events appear in red.
- Point to each event to view the details.
- To display the events that occurred on the cluster's data center, clickView From, and selectDatacenter.Warning events for the data center appear in yellow.
- Point to the warning events.You notice that a hard threshold violation occurred on the data center late in the evening. The hard threshold violation shows that the Badge|Workload metric value was under the acceptable value, and that the violation triggered.
- To view the affected child objects, clickView Fromand selectHost System.
- Click theEventstab to examine the changes that occurred on the USA-Cluster, and determine whether a change occurred that contributed to the root cause of the alert or other problems with the cluster.
- Review the graph.By reviewing the graph, you can determine whether a reoccurring event has caused the errors. Each event indicates that the guest file system is out of disk space. The affected objects appear in the pane following the graph.
- Click each red triangle to identify the affected object and highlight it in that pane.
- Click theCapacitytab to evaluate details of capacity and time remaining.
- Click theAll Metricstab to evaluate the objects in their context in the environment topology to help identify the possible cause of a problem.
- In the top view, selectUSA-Cluster.
- In the metrics pane, expandand double-clickCapacity Remaining (%).The Capacity Remaining (%) calculation appears on the right pane.
- In the metrics pane, expandand double-clickWorkload (%). The Workload (%) calculation appears on the right pane.
- On the toolbar, clickDate Controlsand selectLast 7 Days.The metric chart indicates that the capacity for the cluster remained at a steady level for the past week, but that the Badge|Workload (%) calculation displays workload extremes.
You have analyzed the
symptoms, timeline, events, and metrics related to the problems on your
cluster. Through your analysis, you have determined that the heavy workload on
the cluster has caused the cluster to start running out of capacity.
Examine the Details views and heat maps to interpret
the properties, metrics, and alerts. Also, look for trends and spikes that occur in
the resources for your objects, the distributions of resources across your objects,
and data maps. You can examine the use of various object types across your objects.
Examine the Details views and heat maps to interpret
the properties, metrics, and alerts. Also, to look for trends and spikes that occur
in the resources for your objects, the distributions of resources across your
objects, and data maps. You can examine the use of various object types across your
objects. See Examine the Environment Details.