Deployment health dashboard
Learn more about how the deployment health dashboard presents data from your entire Guardium® deployment.
Data availability
Data source | Information type | Trigger criteria | Data latency | Data purge interval |
---|---|---|---|---|
Analyze limits |
Information such as MySQL connections, HTTP GUI connections, and Tomcat open handlers. |
Not applicable |
Updated every 5 - 10 minutes. |
Data is purged after 14 days The purge interval is configurable using the CLI command store purge object age |
Correlation alerts |
Triggered correlation alerts |
An alert threshold is reached |
Updated based on the alert notification frequency. For more information, see Correlation Alerts. |
Data is purged after 7 days |
System resources |
System configuration, such as CPU cores, system memory, /var disk capacity |
System does not meet minimum requirements |
Updated whenever the user-interface server is started or restarted |
Not applicable |
System self-monitoring |
MySQL disk usage and system disk usage |
Usage meets or exceeds default thresholds (75% for high severity, 90% for critical severity) |
Updated every 5 - 10 minutes. For high-severity, if the same event occurs multiple times in a 15 minute period, the timestamp is updated to reflect the most recent instance. If the same event occurs after a 15 minute interval, a new entry is created with the most recent timestamp. For critical issues, every instance of an event is created with a unique timestamp. |
High-severity issues are purged after 7 days Critical issues are never purged |
Unit utilization |
Unit utilization data such as sniffer restarts, MySQL disk usage, and CPU load. |
Value exceeds unit utilization thresholds |
Updated within 1 - 2 hours, based on the recommended configuration. For more information, see Configuring unit utilization data processing. |
Unit utilization data is purged after 60 days Sniffer buffer usage data is purged after 14 days |
- Only data from systems that are running Guardium V10.1.2 and later are included on the deployment health dashboard.
- When you change the host name of a system, preexisting data that is associated with the original host name is no longer displayed on the deployment health dashboard.
- When a primary central manager transfers data to a backup central manager during a failover scenario, up to 30 minutes of data is unavailable to the deployment health dashboard.
Data presentation
Tile name | ||||||||
---|---|---|---|---|---|---|---|---|
Data source | Resource requirements | Central manager limits | Unit utilization issues | Unit utilization timecharts | Alerts (by category, name, severity, or system) | Events | High severity | Critical |
Analyze data | (values are a percentage of user-configurable limits) | |||||||
Correlation alerts | ||||||||
System resources | ||||||||
System self-monitoring | (When usage meets or exceeds 75% threshold) |
(When usage meets or exceeds 90% threshold) |
||||||
Unit utilization | ||||||||
The following tiles are displayed by default: alerts by name, central manager limits, critical issues, events timeline, high severity issues, and unit utilization issues. |
Dashboard filter
The dashboard filter allows quick filtering of the data based on Guardium systems, issue severity, and time period. Filter settings affect the data displayed on the entire dashboard unless noted otherwise.
The Guardium systems filter allows filtering the dashboard by unit type or by groups defined at .
- Outstanding or unresolved critical issues are displayed on the dashboard regardless of the Severity filter setting.
- For the unit utilization issues tile, the dashboard Severity filter is based on the overall unit utilization severity. For more information about how unit utilization severity is assigned, see Unit utilization issues.
The time filter determines the range of data that is displayed on the dashboard. Default settings allow time periods from 1 hour to 3 weeks, but custom time periods are also supported. The time filter does not apply to critical issues: critical issues are always displayed, regardless of the time filter setting.
Use the Add chart menu to add tiles to the dashboard or replace default tiles that you previously removed.
Dashboard summary
- The Critical and High counts are not affected by adding or removing tiles from the dashboard.
- The counts on the dashboard summary bar reflect the dashboard filter settings.
Alerts by category, name, severity, or system
The deployment health dashboard supports several tiles based on Guardium correlation alerts: Alerts by category, Alerts by name, Alerts by severity, and Alerts by system. Add correlation alert tiles to the dashboard by using the Add chart menu.
Correlation alerts must be explicitly configured for inclusion on the deployment health dashboard. For information about configuring alerts for the dashboard, see Configuring a central manager for the deployment health views.
Central manager limits
The central manager limits tile displays information to help assess central manager activity over time. For example, MySQL connections, HTTP GUI connections, Tomcat open handlers, and other related metrics are tracked on the tile.
All values are expressed as a percentage of a defined analyze limitsthreshold. For example, if a threshold is set at 80%, the tile indicates 100% when that 80% threshold is reached. The thresholds are configurable using the modify_guard_param API command. For more information, see the analyze limits parameters section of modify_guard_param.
Customize the tile to include or exclude specific metrics and show or hide the legend.
Resource requirements
The resource requirements tile indicates whether systems in a Guardium deployment meet the minimum hardware requirements for CPU, memory, and /var disk capacity. Any system resource that does not meet the minimum requirement is designated as a high-severity issue and displayed on both the resource requirements tile and the high severity issues tile.
Use the Include healthy systems check box on the details view of the tile to include all available data for the systems and time frame that are indicated on the dashboard filter bar. By including all available data, the Include healthy systems check box overrides the Severity setting of the overall dashboard filter. Systems without any detected health issues are excluded by default.
- System resource issues are not displayed in the Events timeline because they are not associated with a specific time stamp
Unit utilization issues
The unit utilization issues tile displays issues based on unit utilization thresholds. The issues that are displayed on the tile represent individual metrics that exceed their respective thresholds. The overall severity is assigned based on the highest severity issue that is found in all available metrics for an individual system in a specified time period. For more information about unit utilization thresholds, see Unit utilization and inspection core performance.
- The Period start time indicates that the CM buffer usage monitor data is rolled-up into hourly periods, for example periods starting at 13:00, 12:00, and 11:00.
- The Timestamp indicates when the unit utilization levels data is added to the deployment health dashboard, either based on the unit utilization levels schedule or by using run once now.
Period start | Timestamp |
---|---|
13:00 | 14:40 |
12:00 | 13:40 |
11:00 | 12:40 |
Use the Include healthy systems check box on the details view of the tile to include all available data for the systems and time frame that are indicated on the dashboard filter bar. By including all available data, the Include healthy systems check box overrides the Severity setting of the overall dashboard filter. Systems without any detected health issues are excluded by default.
Unit utilization timecharts
Unit utilization timecharts allow the observation of trends in unit utilization data over time. Unit utilization timecharts can be configured to show multiple unit utilization metrics for a single Guardium system or to show a single unit utilization metric for multiple Guardium systems.
- The x-axis represents the period start time
- When multiple metrics are being charted and the values for the metrics are in the same range, one y-axis is drawn. For example, both MySQL disk usage and /var disk usage are expressed as percentages and are drawn with the same y-axis.
- When multiple metrics are being charted and the values of the metrics are not similar, two y-axes are drawn. For example, MySQL disk usage is expressed as a percentage and flat log requests is expressed as an integer, so two y-axes are drawn: one displaying percentages and one displaying integers.
- If the value of a metric falls outside the range of a y-axis, that value is displayed at the
bottom of the chart. This behavior accommodates scenarios where different metrics are expressed with
similar units but significantly different values: for example, integers in the range of thousands
versus millions.Tip: Create multiple time charts when values are in significantly different ranges.