Scenario: Troubleshooting overloaded systems using the deployment health topology view

This topic describes using the deployment health topology view to identify and fix an overloaded system in your environment.

About this task

This scenario involves identifying health issues from the deployment health topology view, assessing the root cause, and correlating that assessment with additional data before resolving the problem and verifying the fix. The example described here involves an overloaded collector, but the process is applicable for other cases.

Procedure

  1. On a central manager, navigate to Manage > System View > Deployment Health Topology.
  2. Review the deployment topology and assess the overall health of systems in the environment.
    At a high level, no health issues icons indicate healthy systems while low severity issues and high severity issues icons indicate systems with some health issues.
  3. If you notice systems with low severity issues or high severity issues status icons, click the node to view an overlay with additional health information.
  4. Use the information presented on the node overlay to begin diagnosing any health problems. For example, a collector with high or medium severity statuses for /var disk usage, Restarts, Analyzer queue, and Logger queue indicates that the collector is overloaded.
  5. After initially assessing health issues from the deployment health topology view, try to correlate your findings with additional data. For example, if you suspect that a system is overloaded, begin monitoring the traffic for that system.
  6. When you are confident that you have diagnosed the underlying health issues, take corrective actions. In the example of an overloaded system, you could establish Enterprise load balancing or reassign S-TAPs to another collector.
    Typically, this set of symptoms would not occur if enterprise load balancing was already configured and in use.
  7. After taking corrective actions, the status of the node on the deployment health topology view will be updated following the next refresh of unit utilization and central manager buffer usage monitor data. This refresh interval depends on your schedule for processing unit utilization data.