Deployment health dashboard

Learn more about how the deployment health dashboard presents data from your entire Guardium® deployment.

Data availability

Several factors influence that availability and latency of health data and how that data is displayed on the deployment health dashboard. The following table summarizes the data included on the dashboard, trigger criteria, and data latency and purge information.

Table 1. Summary of deployment health dashboard data
Data source	Information type	Trigger criteria	Data latency	Data purge interval
Analyze limits	Information such as MySQL connections, HTTP GUI connections, and Tomcat open handlers.	Not applicable	Updated every 5 - 10 minutes.	Data is purged after 14 days The purge interval is configurable using the CLI command store purge object age
Correlation alerts	Triggered correlation alerts	An alert threshold is reached	Updated based on the alert notification frequency. For more information, see Correlation Alerts.	Data is purged after 7 days
System resources	System configuration, such as CPU cores, system memory, /var disk capacity	System does not meet minimum requirements	Updated whenever the user-interface server is started or restarted	Not applicable
System self-monitoring	MySQL disk usage and system disk usage	Usage meets or exceeds default thresholds (75% for high severity, 90% for critical severity)	Updated every 5 - 10 minutes. For high-severity, if the same event occurs multiple times in a 15 minute period, the timestamp is updated to reflect the most recent instance. If the same event occurs after a 15 minute interval, a new entry is created with the most recent timestamp. For critical issues, every instance of an event is created with a unique timestamp.	High-severity issues are purged after 7 days Critical issues are never purged
Unit utilization	Unit utilization data such as sniffer restarts, MySQL disk usage, and CPU load.	Value exceeds unit utilization thresholds	Updated within 1 - 2 hours, based on the recommended configuration. For more information, see Configuring unit utilization data processing.	Unit utilization data is purged after 60 days Sniffer buffer usage data is purged after 14 days

Important:

Only data from systems that are running Guardium V10.1.2 and later are included on the deployment health dashboard.
When you change the host name of a system, preexisting data that is associated with the original host name is no longer displayed on the deployment health dashboard.
When a primary central manager transfers data to a backup central manager during a failover scenario, up to 30 minutes of data is unavailable to the deployment health dashboard.

Data presentation

The deployment health dashboard formats and presents data through various tiles or small window-like containers. The following table summarizes the data that is presented on each dashboard tile.

Table 2. Summary of deployment health dashboard tiles
	Tile name
Data source	Resource requirements	Central manager limits	Unit utilization issues	Unit utilization timecharts	Alerts (by category, name, severity, or system)	Events	High severity	Critical
Analyze data		(values are a percentage of user-configurable limits)
Correlation alerts
System resources
System self-monitoring							(When usage meets or exceeds 75% threshold)	(When usage meets or exceeds 90% threshold)
Unit utilization
The following tiles are displayed by default: alerts by name, central manager limits, critical issues, events timeline, high severity issues, and unit utilization issues.

Dashboard filter

The dashboard filter allows quick filtering of the data based on Guardium systems, issue severity, and time period. Filter settings affect the data displayed on the entire dashboard unless noted otherwise.

The Guardium systems filter allows filtering the dashboard by unit type or by groups defined at Manage > Central Management > Managed Unit Groups.

By default, the dashboard displays all available issues: low, medium, high, and critical. Use the Severity menu to filter data on the dashboard by severity. Selecting high filters the entire dashboard to display only high-severity issues. Selecting critical filters the entire dashboard to display only critical issues. It is possible to select both high and critical issues to filter out all lower-severity data.

Notes:

Outstanding or unresolved critical issues are displayed on the dashboard regardless of the Severity filter setting.
For the unit utilization issues tile, the dashboard Severity filter is based on the overall unit utilization severity. For more information about how unit utilization severity is assigned, see Unit utilization issues.

The time filter determines the range of data that is displayed on the dashboard. Default settings allow time periods from 1 hour to 3 weeks, but custom time periods are also supported. The time filter does not apply to critical issues: critical issues are always displayed, regardless of the time filter setting.

Use the Add chart menu to add tiles to the dashboard or replace default tiles that you previously removed.

Dashboard summary

The dashboard summary provides overall counts of health issues that are detected in your Guardium deployment. The Collectors with issues and Aggregators with issues counts indicate the number of systems--collectors and aggregators--that are detected with health issues. The Critical and High counts indicate the number of issues detected from all systems that are included on the dashboard.

Note:

The Critical and High counts are not affected by adding or removing tiles from the dashboard.
The counts on the dashboard summary bar reflect the dashboard filter settings.

Alerts by category, name, severity, or system

The deployment health dashboard supports several tiles based on Guardium correlation alerts: Alerts by category, Alerts by name, Alerts by severity, and Alerts by system. Add correlation alert tiles to the dashboard by using the Add chart menu.

Correlation alerts must be explicitly configured for inclusion on the deployment health dashboard. For information about configuring alerts for the dashboard, see Configuring a central manager for the deployment health views.

Central manager limits

The central manager limits tile displays information to help assess central manager activity over time. For example, MySQL connections, HTTP GUI connections, Tomcat open handlers, and other related metrics are tracked on the tile.

All values are expressed as a percentage of a defined analyze limitsthreshold. For example, if a threshold is set at 80%, the tile indicates 100% when that 80% threshold is reached. The thresholds are configurable using the modify_guard_param API command. For more information, see the analyze limits parameters section of modify_guard_param.

Customize the tile to include or exclude specific metrics and show or hide the legend.

Resource requirements

The resource requirements tile indicates whether systems in a Guardium deployment meet the minimum hardware requirements for CPU, memory, and /var disk capacity. Any system resource that does not meet the minimum requirement is designated as a high-severity issue and displayed on both the resource requirements tile and the high severity issues tile.

Use the Include healthy systems check box on the details view of the tile to include all available data for the systems and time frame that are indicated on the dashboard filter bar. By including all available data, the Include healthy systems check box overrides the Severity setting of the overall dashboard filter. Systems without any detected health issues are excluded by default.

A table that displays all met and unmet resource requirements in your Guardium deployment is also available at Manage > Central Management > System Resources.

Note:

System resource issues are not displayed in the Events timeline because they are not associated with a specific time stamp

Unit utilization issues

The unit utilization issues tile displays issues based on unit utilization thresholds. The issues that are displayed on the tile represent individual metrics that exceed their respective thresholds. The overall severity is assigned based on the highest severity issue that is found in all available metrics for an individual system in a specified time period. For more information about unit utilization thresholds, see Unit utilization and inspection core performance.

The details view of the unit utilization issues tile includes both a Period start time and a Timestamp:

The Period start time indicates that the CM buffer usage monitor data is rolled-up into hourly periods, for example periods starting at 13:00, 12:00, and 11:00.
The Timestamp indicates when the unit utilization levels data is added to the deployment health dashboard, either based on the unit utilization levels schedule or by using run once now.

For more information, see Configuring unit utilization data processing.

The first time that unit utilization data is brought into the deployment health dashboard, all the unit utilization data has the same timestamp but different period start times. Over time, the time stamps will appear at intervals based on the unit utilization levels schedule. For example, if the unit utilization levels data is collected every hour at 40 minutes after the hour, you will see period start time and timestamp values as follows:

Table 3. Example unit utilization period start time and timestamp values
Period start	Timestamp
13:00	14:40
12:00	13:40
11:00	12:40

Unit utilization timecharts

Unit utilization timecharts allow the observation of trends in unit utilization data over time. Unit utilization timecharts can be configured to show multiple unit utilization metrics for a single Guardium system or to show a single unit utilization metric for multiple Guardium systems.

Unit utilization timecharts are structured based on the following criteria:

The x-axis represents the period start time
When multiple metrics are being charted and the values for the metrics are in the same range, one y-axis is drawn. For example, both MySQL disk usage and /var disk usage are expressed as percentages and are drawn with the same y-axis.
When multiple metrics are being charted and the values of the metrics are not similar, two y-axes are drawn. For example, MySQL disk usage is expressed as a percentage and flat log requests is expressed as an integer, so two y-axes are drawn: one displaying percentages and one displaying integers.
If the value of a metric falls outside the range of a y-axis, that value is displayed at the bottom of the chart. This behavior accommodates scenarios where different metrics are expressed with similar units but significantly different values: for example, integers in the range of thousands versus millions.
Tip: Create multiple time charts when values are in significantly different ranges.

Note: Systems are not included on Timechart settings > Host name menu when unit utilization data does not exist for that system in the time frame that is specified on the dashboard filter bar.