Troubleshoot Problems with a Host System

Use the Troubleshooting tabs to identify the root cause of problems that the system does not resolve by alert recommendations or simple analysis.
To troubleshoot the symptoms of the capacity problems that are occurring on the cluster and host system, and determine when those problems occurred, use the Troubleshooting tabs to investigate the memory problem.
  1. From the left menu, click
    Environment
    , and then click
    Object Browser
    >
    vSphere Hosts and Clusters
    and select the object. For example, USA-Cluster.
  2. Click the
    Alerts
    tab and review the symptoms.
    The
    Symptoms
    tab displays the symptoms that triggered on the selected cluster. You notice that several critical symptoms exist.
    • Cluster Compute Resource Time Remaining with committed projects is critically low
    • Cluster Compute Resource Time Remaining is critically low
    • Capacity remaining is critically low
  3. Investigate the critical symptoms.
    1. Point to each critical symptom to identify the metric used.
    2. To view only the symptoms that affect the cluster, enter
      cluster
      in the quick filter text box.
      When you point to
      Cluster Compute Resource Time Remaining is critically low
      , the metric
      Capacity|Time Remaining
      appears. You notice that its value is less than or equal to zero, which caused the capacity symptom to trigger and generate an alert on the USA-Cluster.
  4. Click the
    Events > Timeline
    tab to review the triggered symptoms, alerts, and events that occurred on the USA-Cluster over time, and identify when the problems occurred.
    1. Click the calendar and select
      Last 7 Days
      as the range.
      Several events appear in red.
    2. Point to each event to view the details.
    3. To display the events that occurred on the cluster's data center, click
      View From
      , and select
      Datacenter
      .
      Warning events for the data center appear in yellow.
    4. Point to the warning events.
      You notice that a hard threshold violation occurred on the data center late in the evening. The hard threshold violation shows that the Badge|Workload metric value was under the acceptable value, and that the violation triggered.
    5. To view the affected child objects, click
      View From
      and select
      Host System
      .
  5. Click the
    Events
    tab to examine the changes that occurred on the USA-Cluster, and determine whether a change occurred that contributed to the root cause of the alert or other problems with the cluster.
    1. Review the graph.
      By reviewing the graph, you can determine whether a reoccurring event has caused the errors. Each event indicates that the guest file system is out of disk space. The affected objects appear in the pane following the graph.
    2. Click each red triangle to identify the affected object and highlight it in that pane.
  6. Click the
    Capacity
    tab to evaluate details of capacity and time remaining.
  7. Click the
    All Metrics
    tab to evaluate the objects in their context in the environment topology to help identify the possible cause of a problem.
    1. In the top view, select
      USA-Cluster
      .
    2. In the metrics pane, expand
      All Metrics > Capacity Analyltics Generated
      and double-click
      Capacity Remaining (%)
      .
      The Capacity Remaining (%) calculation appears on the right pane.
    3. In the metrics pane, expand
      All Metrics > Badge
      and double-click
      Workload (%)
      . The Workload (%) calculation appears on the right pane.
    4. On the toolbar, click
      Date Controls
      and select
      Last 7 Days
      .
      The metric chart indicates that the capacity for the cluster remained at a steady level for the past week, but that the Badge|Workload (%) calculation displays workload extremes.
You have analyzed the symptoms, timeline, events, and metrics related to the problems on your cluster. Through your analysis, you have determined that the heavy workload on the cluster has caused the cluster to start running out of capacity.
Examine the Details views and heat maps to interpret the properties, metrics, and alerts. Also, look for trends and spikes that occur in the resources for your objects, the distributions of resources across your objects, and data maps. You can examine the use of various object types across your objects.
Examine the Details views and heat maps to interpret the properties, metrics, and alerts. Also, to look for trends and spikes that occur in the resources for your objects, the distributions of resources across your objects, and data maps. You can examine the use of various object types across your objects. See Examine the Environment Details.