Configure cluster management

Click the Console tab and use the left navigation pane to configure cluster management.

The left navigation pane has the following sections: Create (graphs), Management, Disk Monitoring, License Management, Grid Management, Collection Methods, Templates, Presets, Import/Export, Syslog Settings, Configuration, and Utilities. Most are default Cacti utilities and features, and are documented in the Cacti documentation (http://cacti.net/documentation.php).

Perform the following actions to configure cluster management:

Monitor your cluster by using RTM.

Go to the Management section of the Console tab. For more information about management, see Management section.
Add LSF clusters and complete certain database administration functions.

Go to the Grid Management section of the Console tab..
Monitor license servers and pollers.

Go to the License Management section of the Console tab.
Use Cacti utilities and features.

Go to the Collection Methods, Templates, or Import/Export sections of the Console tab to use Cacti utilities and features. For more information, see the Cacti documentation (http://cacti.net/documentation.php).

Management section

The Management section is in the Console tab.

Thresholds page

Go to the Thresholds page by clicking Thresholds under the Management section of the Console tab. Configured thresholds in your cluster will be displayed. A threshold triggers an alert if your clusters, hosts, queues, or jobs meet the conditions of the threshold.

Name. The name of the cluster or host and the threshold. Click the name to change the threshold settings.
Type. The type of threshold (for example, High/Low, Baseline, and Time Based)
High. The high threshold boundary value. If the current value of the monitored data source item is greater than this value for a specified duration, the threshold triggers an alert.
Low. The low threshold boundary value. If the current value of the monitored data source item is lower than this value for a specified duration, the threshold triggers an alert.
Trigger. The amount of time that the data source item must be in breach of the threshold before the threshold triggers an alert.
Duration. If the data source item is still in breach of the threshold, the duration is the amount of time from when the alert was first triggered.
Repeat. The amount of time that the threshold waits before the alert repeats if the data source item is still in breach of the threshold.
Current. The current value of the monitored data field.
Triggered. Indicates whether this threshold triggered an alert
Enabled. Indicates whether this threshold is active.
Ack. Indicates whether the threshold alerts are acknowledged: "on" indicates that the threshold is acknowledged; "off" indicates that the threshold either is not acknowledged, or had its acknowledgement reset.

Threshold Item

Go to the Threshold Item page for a threshold by clicking the name of the threshold from the Thresholds page. Configure threshold settings and event triggering from the page.

Event triggering behavior is based on realert cycle settings. When the threshold first triggers an alert, the event trigger starts based on a high or low threshold breach. If the alert stays triggered, the event trigger is started again unless the realert cycle is set to Never. When the alert reverts to normal, the threshold triggers the norm threshold command or script.

You can configure the following items from here:

Template propagation enabled: Enable the propagation of changes to the threshold template
Threshold name: The name of the threshold as it is displayed in the Name column in the list of thresholds.

Note:
You can use placeholders to customize your threshold name. Placeholders for the threshold name are enclosed by pipe characters (|), for example, |cluster_name|.
Threshold enabled or disabled
Weekend exemption: Disable threshold alerts on weekends
Disable restoration email: Disable threshold alerts when the threshold returns to normal
Reset acknowledgement: Reset acknowledgements when the threshold returns to normal
High/low threshold values
Threshold type: High/low, baseline, or time based.
Event triggering (Shell command): Specifies event trigger commands or shell scripts in the event of a breach.
- High Threshold Trigger Command/Script: If the threshold is breached because the data source exceeds this value, the threshold triggers the specified command or shell script.
- Low Threshold Trigger Command/Script: If the threshold is breached because the data source drops below this value, the threshold triggers the specified command or shell script.
- Norm Threshold Trigger Command/Script: If the threshold is breached, then returns to normal, the threshold triggers the specified command or shell script.
Event triggering (Grid administrator host level triggers): Specifies host-level actions in the event of a breach.
- Host Level Action (High Threshold): If the threshold is breached because the data source exceeds this value, the threshold triggers the specified action on the host.
- Host Level Action (Low Threshold): If the threshold is breached because the data source drops below this value, the threshold triggers the specified action on the host.
Email message body: Email alert message content. This control specifies the template that is used in alert email notifications for this threshold.

Note:
You can use placeholders to customize your alert emails and provide more information. Placeholders for the email message body are enclosed by angle brackets (<>), for example, <cluster_name>.
Syslog settings
Data type: Special formatting for the data
Realert cycle: The amount of time the threshold repeats the alert, if it is still in breach.
Notify accounts and extra alert emails: Email addresses to be notified when the threshold raises an alert

Placeholder tags

Placeholders are custom tags that represent real system values. You can insert placeholders in threshold names to show customized names that are based on your system and you can insert placeholders in alert email templates to present more information for administrators. Placeholders make it easy for them to follow up on the alert.

Tags for threshold names are enclosed by pipe characters (|), while tags for alert email templates are enclosed by angle brackets (<>). Not all placeholders are available for threshold names; some placeholders are only available for alert email templates. The following is a list of the placeholders available for your thresholds:

Table 1. Names, tags, and descriptions of available placeholders
Placeholder name	Tag for threshold name	Tag for alert email template	Description
Cluster ID	`\|clusterid\|`	`<clusterid>`	The ID of the cluster.
Cluster name	`\|cluster_name\|`	`<cluster_name>`	The name of the cluster.
Cluster LSF master	`\|cluster_lsfmaster\|`	`<cluster_lsfmaster>`	The name of the LSF master host for the cluster.
Cluster LSF version	`\|cluster_version\|`	`<cluster_version>`	The version of LSF running in the cluster.
Cluster LSF LIM port	`\|cluster_limport\|`	`<cluster_limport>`	The port number of LIM running in LSF on the master host.
Custom data value	`\|custom_custom_field_name\|`	`<custom_custom_field_name>`	The custom data value from the data source that is linked in this alert. For example, `custom_percent`, `custom_status`.
Host name	`\|host_hostname\|`	`<host_hostname>`	The host name of the device that is linked in this alert.
Host description	`\|host_description\|`	`<host_description>`	The host description of the device that is linked in this alert.
Threshold description	Not available	`<DESCRIPTION>`	The threshold description.
Threshold host name	Not available	`<HOSTNAME>`	The host name of the threshold.
Threshold trigger time	Not available	`<TIME>`	The time in which the threshold triggered this alert.
Threshold graph URL	Not available	`<URL>`	The link to the URL of the threshold graph.
Threshold current value	Not available	`<CURRENTVALUE>`	The current value of the data field that is being monitored by the threshold, at the time of the alert email.
Threshold name	Not available	`<NAME>`	The name of the threshold.
Threshold data source name	Not available	`<DSNAME>`	The name of the data source that is being monitored by the threshold.
Threshold type	Not available	`<THOLDTYPE>`	The threshold type.
Threshold high value	Not available	`<HI>`	The high threshold boundary value.
Threshold low value	Not available	`<LO>`	The low threshold boundary value.
Threshold trigger	Not available	`<TRIGGER>`	The threshold trigger value.
Threshold graph ID	Not available	`<GRAPHID>`	The ID of the threshold graph.
Threshold duration	Not available	`<DURATION>`	The duration of the threshold.
Threshold details URL	Not available	`<DETAILS_URL>`	A URL to the threshold details page, which is a list of hosts that breached this threshold.
Threshold breached items	Not available	`<BREACHED_ITEMS>`	A list of items that breached this threshold, in an HTML table format.
Threshold graph	Not available	`<GRAPH>`	The threshold graph that is embedded into the email.
Threshold date	Not available	`<DATE_RFC822>`	The threshold date in RFC 822 format. For example, `Thu, 01 Jan 2009 01:11:01 +0100`