Configure cluster management

Click the Console tab and use the left navigation pane to configure cluster management.

The left navigation pane has the following sections: Create (graphs), Management, Disk Monitoring, License Management, Grid Management, Collection Methods, Templates, Presets, Import/Export, Syslog Settings, Configuration, and Utilities. Most are default Cacti utilities and features, and are documented in the Cacti documentation (http://cacti.net/documentation.php).

Perform the following actions to configure cluster management:

  • Monitor your cluster by using RTM.

    Go to the Management section of the Console tab. For more information about management, see Management section.

  • Add LSF clusters and complete certain database administration functions.

    Go to the Grid Management section of the Console tab..

  • Monitor license servers and pollers.

    Go to the License Management section of the Console tab.

  • Use Cacti utilities and features.

    Go to the Collection Methods, Templates, or Import/Export sections of the Console tab to use Cacti utilities and features. For more information, see the Cacti documentation (http://cacti.net/documentation.php).

Management section

The Management section is in the Console tab.

Thresholds page

Go to the Thresholds page by clicking Thresholds under the Management section of the Console tab. Configured thresholds in your cluster will be displayed. A threshold triggers an alert if your clusters, hosts, queues, or jobs meet the conditions of the threshold.

  • Name. The name of the cluster or host and the threshold. Click the name to change the threshold settings.

  • Type. The type of threshold (for example, High/Low, Baseline, and Time Based)

  • High. The high threshold boundary value. If the current value of the monitored data source item is greater than this value for a specified duration, the threshold triggers an alert.

  • Low. The low threshold boundary value. If the current value of the monitored data source item is lower than this value for a specified duration, the threshold triggers an alert.

  • Trigger. The amount of time that the data source item must be in breach of the threshold before the threshold triggers an alert.

  • Duration. If the data source item is still in breach of the threshold, the duration is the amount of time from when the alert was first triggered.

  • Repeat. The amount of time that the threshold waits before the alert repeats if the data source item is still in breach of the threshold.

  • Current. The current value of the monitored data field.

  • Triggered. Indicates whether this threshold triggered an alert

  • Enabled. Indicates whether this threshold is active.

  • Ack. Indicates whether the threshold alerts are acknowledged: "on" indicates that the threshold is acknowledged; "off" indicates that the threshold either is not acknowledged, or had its acknowledgement reset.

Threshold Item

Go to the Threshold Item page for a threshold by clicking the name of the threshold from the Thresholds page. Configure threshold settings and event triggering from the page.

Event triggering behavior is based on realert cycle settings. When the threshold first triggers an alert, the event trigger starts based on a high or low threshold breach. If the alert stays triggered, the event trigger is started again unless the realert cycle is set to Never. When the alert reverts to normal, the threshold triggers the norm threshold command or script.

You can configure the following items from here:

  • Template propagation enabled: Enable the propagation of changes to the threshold template

  • Threshold name: The name of the threshold as it is displayed in the Name column in the list of thresholds.

    Note:

    You can use placeholders to customize your threshold name. Placeholders for the threshold name are enclosed by pipe characters (|), for example, |cluster_name|.

  • Threshold enabled or disabled

  • Weekend exemption: Disable threshold alerts on weekends

  • Disable restoration email: Disable threshold alerts when the threshold returns to normal

  • Reset acknowledgement: Reset acknowledgements when the threshold returns to normal

  • High/low threshold values

  • Threshold type: High/low, baseline, or time based.

  • Event triggering (Shell command): Specifies event trigger commands or shell scripts in the event of a breach.

    • High Threshold Trigger Command/Script: If the threshold is breached because the data source exceeds this value, the threshold triggers the specified command or shell script.

    • Low Threshold Trigger Command/Script: If the threshold is breached because the data source drops below this value, the threshold triggers the specified command or shell script.

    • Norm Threshold Trigger Command/Script: If the threshold is breached, then returns to normal, the threshold triggers the specified command or shell script.

  • Event triggering (Grid administrator host level triggers): Specifies host-level actions in the event of a breach.

    • Host Level Action (High Threshold): If the threshold is breached because the data source exceeds this value, the threshold triggers the specified action on the host.

    • Host Level Action (Low Threshold): If the threshold is breached because the data source drops below this value, the threshold triggers the specified action on the host.

  • Email message body: Email alert message content. This control specifies the template that is used in alert email notifications for this threshold.

    Note:

    You can use placeholders to customize your alert emails and provide more information. Placeholders for the email message body are enclosed by angle brackets (<>), for example, <cluster_name>.

  • Syslog settings

  • Data type: Special formatting for the data

  • Realert cycle: The amount of time the threshold repeats the alert, if it is still in breach.

  • Notify accounts and extra alert emails: Email addresses to be notified when the threshold raises an alert

Placeholder tags

Placeholders are custom tags that represent real system values. You can insert placeholders in threshold names to show customized names that are based on your system and you can insert placeholders in alert email templates to present more information for administrators. Placeholders make it easy for them to follow up on the alert.

Tags for threshold names are enclosed by pipe characters (|), while tags for alert email templates are enclosed by angle brackets (<>). Not all placeholders are available for threshold names; some placeholders are only available for alert email templates. The following is a list of the placeholders available for your thresholds:


Table 1. Names, tags, and descriptions of available placeholders

Placeholder name

Tag for threshold name

Tag for alert email template

Description

Cluster ID

|clusterid|

<clusterid>

The ID of the cluster.

Cluster name

|cluster_name|

<cluster_name>

The name of the cluster.

Cluster LSF master

|cluster_lsfmaster|

<cluster_lsfmaster>

The name of the LSF master host for the cluster.

Cluster LSF version

|cluster_version|

<cluster_version>

The version of LSF running in the cluster.

Cluster LSF LIM port

|cluster_limport|

<cluster_limport>

The port number of LIM running in LSF on the master host.

Custom data value

|custom_custom_field_name|

<custom_custom_field_name>

The custom data value from the data source that is linked in this alert. For example, custom_percent, custom_status.

Host name

|host_hostname|

<host_hostname>

The host name of the device that is linked in this alert.

Host description

|host_description|

<host_description>

The host description of the device that is linked in this alert.

Threshold description

Not available

<DESCRIPTION>

The threshold description.

Threshold host name

Not available

<HOSTNAME>

The host name of the threshold.

Threshold trigger time

Not available

<TIME>

The time in which the threshold triggered this alert.

Threshold graph URL

Not available

<URL>

The link to the URL of the threshold graph.

Threshold current value

Not available

<CURRENTVALUE>

The current value of the data field that is being monitored by the threshold, at the time of the alert email.

Threshold name

Not available

<NAME>

The name of the threshold.

Threshold data source name

Not available

<DSNAME>

The name of the data source that is being monitored by the threshold.

Threshold type

Not available

<THOLDTYPE>

The threshold type.

Threshold high value

Not available

<HI>

The high threshold boundary value.

Threshold low value

Not available

<LO>

The low threshold boundary value.

Threshold trigger

Not available

<TRIGGER>

The threshold trigger value.

Threshold graph ID

Not available

<GRAPHID>

The ID of the threshold graph.

Threshold duration

Not available

<DURATION>

The duration of the threshold.

Threshold details URL

Not available

<DETAILS_URL>

A URL to the threshold details page, which is a list of hosts that breached this threshold.

Threshold breached items

Not available

<BREACHED_ITEMS>

A list of items that breached this threshold, in an HTML table format.

Threshold graph

Not available

<GRAPH>

The threshold graph that is embedded into the email.

Threshold date

Not available

<DATE_RFC822>

The threshold date in RFC 822 format. For example,

Thu, 01 Jan 2009 01:11:01 +0100