Threshold Manager

Use the Threshold Manager to review the predefined thresholds for a monitoring agent and to create and edit thresholds. Thresholds are used to compare the sampled value of an attribute with the value set in the threshold. If the sampled value satisfies the comparison, an event is opened. The event closes automatically when the threshold comparison is no longer true.
After you click System Configuration > Threshold Manager, the page is displayed with a table of the thresholds that were defined for the selected data source type.

The data types that display when you click the Data Source Type list box are for the types of monitoring agents and data collectors that are installed in your managed environment. Select the data type for which you want to create or view thresholds.

The table lists all the thresholds that were created for the selected data type, and has tools for managing thresholds:
  • New opens the Threshold Editor for defining a threshold for the selected data type.
  • Select a threshold and click Edit to open the Threshold Editor for editing the definition.
  • Select a threshold that you no longer want and click Delete. After you confirm that you want to delete the threshold, it is removed from the list and from any resource groups that it was assigned to. Any open events for the threshold are closed.
  • For a long list, you can click inside the filter text box and type the beginning of the value to filter by. As you type, the rows that do not fit the criteria are filtered out. To clear the filter, click the x icon in the filter box or press the Backspace key.

For more information about the predefined thresholds and custom thresholds that are displayed in the table and the significance of the resource group assignment (or lack thereof), see Background information. For a quick hands-on lesson, see Tutorial: Defining a threshold.

Threshold Editor

After you click New or select a threshold and click Edit, the Threshold Editor is displayed with the following fields:
Name
Enter a unique name for the threshold. The name must begin with a letter and can be up to 31 letters, numbers, and underscores, such as Average_Processor_Speed_Warning. The threshold name is displayed in the Application Performance Dashboard Events tab and in certain dashboard tables.
Description
Optional. A description is useful for recording the purpose of the threshold that users can see in the Threshold Manager.
Severity
Select the appropriate event severity from the list: Fatal, Critical, Minor, Warning, or Unknown.
The severities are consolidated for display in the Application Performance Dashboard: Fatal and Critical events show as Critical; Minor and Warning events show as Warning; and Unknown events show as Normal (see Event Status).
Forward EIF Event?
If you configured event forwarding in the System Configuration > Advanced Configuration page (Event Manager), open events are forwarded by default to the event destinations that you configured, for example, EIF event targets, or to Cloud Event Management, or Alert Notification. Change the setting to No if you do not want to forward events for this threshold to any event destinations.
If you configured event forwarding in the System Configuration > Advanced Configuration page (Event Manager), open events are forwarded to an EIF receiver by default. Change the setting to No if you do not want to forward events for this threshold to an EIF receiver.
To customize how thresholds are mapped to forwarded events, thus overriding the default mapping between thresholds and events forwarded to the event server, click EIF Slot Customization. For more information, see Customizing an event to forward to an EIF receiver.
Interval
Enter or select the time to wait between taking data samples in HHMMSS format, such as 00 15 00 for 15 minutes. For sampled-event thresholds, the minimum interval is 000030 (30 seconds) and the maximum is 235959 (23 hours, 59 minutes, and 59 seconds).

A value of 000000 (six zeros) indicates a pure event threshold. Pure events are unsolicited notifications. Thresholds for pure events have no sampling interval, thus they have no constant metric that can be monitored for current values. Pure events are closed after 24 hours or as set in the Advanced Configuration page Pure Event Close Time field in category Event Manager.

Required consecutive samples
Specify how many consecutive threshold samples must evaluate to true before an event is generated: For any threshold with a setting of 1 and a sample that evaluates to true, an event is generated immediately; a setting of 2 means that two consecutive threshold samples must evaluate to true before an event is opened.
Data set
Select the data set (attribute group) for the type of data to be sampled. The attributes that are available for inclusion in the condition are from the chosen data set. If the threshold has multiple conditions, they must all be from the same data set.

To get a short description of a data set, hover the mouse over the name. You can get the complete description of the data set and attributes by clicking the Learn more link in the hover help. You can also click Question mark icon in navigation bar Help > Help Contents or Question mark icon in navigation bar Help > Documentation in the navigation bar, and open the help or download the reference for the monitoring agent.

Some agents are categorized as multi-node agents, which have subnodes for monitoring multiple agent resources. A multi-node agent might have data sets that can be used in a threshold but any events opened for the threshold do not display in the Application Performance Dashboard. A message notifies you of the limitation. Such events can be forwarded to the IBM® Netcool/OMNIbus event manager.

Display item
Optional. For multiple row data sets only. After a row evaluation causes an event to open, no more events can be opened for this threshold on the monitored system until the event is closed. By selecting a display item, you enable the threshold to continue evaluating the other rows in the data sampling and open more events if other rows qualify. As well, the display item is shown in the Events tab of the Application Performance Dashboard so that you can easily distinguish among the rows for which events were opened. The list contains only the attributes that you can designate as display items.
Logical Operator
Ignore this field if your threshold has only one condition. If you are measuring multiple conditions, select one of the following operators before you click New to add a second or third (or more) condition:
  • And (&) if the previous condition and the next condition must be met for the threshold to be breached
  • Or (|) if either of them can be met for the threshold to be breached

A mix of logical operators is not supported; use either all And operators or all Or operators. The threshold can have up to nine conditions when the Or operator is used; up to 10 conditions when the And operator is used.

If you are using the Missing function (described later in the Operator section), you can use only the And operator in the formula.

Conditions
The threshold definition can logically include multiple simultaneous thresholds or conditions.
Click New to add a condition. Select a condition and click Edit to modify the expression, or click Delete to remove the expression.
After you click New or Edit, complete the fields in the Add Condition or Edit Condition dialog box that opens:
Count check box
For data sets that return multiple rows for each data sample, you can have each row that meets the criteria of the condition counted. An event is opened after the count Value is reached and any other conditions in the formula are met. For example, if the number of zombie processes exceeds 10, issue an alert.

In the following example, the condition is true when more than 10 rows are counted: Attribute Timestamp, Operator Greater Than, Value 10.

Select the Count check box, the Attribute to be counted, the relational Operator, and count Value.

If the formula has multiple conditions, each condition must use the And Boolean operator. Count and Time Delta are mutually exclusive: If you select the check box for one function, the other function is disabled. The attribute cannot be a system identifier, such as Server Name or ORIGINNODE, be specified as the Display Item, or be from a data set for which the threshold opens pure events.
Time Delta check box
Use the Time Delta function in a condition to compare the sampled time stamp (such as recording time) with the specified time difference.
After you select the Time Delta check box, the Time Delta field is displayed for you to combine + (plus) or - (minus) with the number of Days, Hours, Minutes, or Seconds. Select Sampled Time or Specific Time as the Value to use in the comparison.
In the following Event Log example, the formula compares the time that the event was logged with the time stamp from the data sampling. If the event occurred seven days earlier, the comparison is true. If the relational operator was changed to Less Than or Equal, the comparison would be true after 8 days, 9 days, and so on:
  • Attribute Timestamp
  • Time Delta -7 Days
  • Operator Equal
  • Value Entry Time
Attribute
Select the attribute that you want to compare in this condition. To see a short description of the attribute, hover the mouse over the name in the list.
Operator
Select the relational operator for the type of comparison:
  • Equal
  • Not Equal
  • Greater than
  • Greater than or Equal
  • Less than
  • Less than or equal
  • Regular expression contains
  • Regular expression does not contain

Regular expression contains and Regular expression does not contain look for a pattern match to the expression. The easier it is to match a string with the expression, the more efficient the workload at the agent. The expression does not need to match the entire line; only the substring in the expression. For example, in See him run, you want to know if the string contains him You could compose the regular expression using him , but you could also use .*him.*. Or if you are looking for See, you could enter See, or you could enter ˆSee to confirm that it's at the beginning of the line. Entering .* wildcards is a less efficient search and raises the workload. For more information about regular expressions, see the developerWorks® technical library topic or search regex in your browser.

You can also select the Missing function, which compares the value of the specified metric with a list of values that you supply. The condition is true when the value does not match any in the list. This function is useful when you want notification that something is not present in your system. Requirements and restrictions:
  1. The selected metric must be a text attribute: time and numeric attributes cannot be used.
  2. Separate each value with a comma (,), for example, fred,mary,jean.
  3. You can have only one Missing condition in a threshold.
  4. Missing must be the last condition in the formula. If other conditions are required, enter them before you add the Missing function and use only the And (&) operator in the formula. Otherwise, all subsequent rows are disabled.
Value
Enter the value to compare by using the format that is allowed for the metric, such as 20 for 20% or 120 for 2 minutes.
Group assignment
Assign a resource group to distribute the threshold to the managed systems of the same type within the resource group. The resource groups that are available are the user defined groups that you have Modify permission for and the system groups (for the agent type) that you have View permission for. The available system groups are also limited to those that are suitable for the chosen data set.

A threshold with no group assigned is distributed to no monitored systems and remains stopped until it is distributed to a resource group.

A system group, such as Linux OS or HTTP Server, distributes the threshold to all managed systems where that agent is installed. By default, every predefined threshold is assigned to the system group for that agent. (You can disable all predefined thresholds in the Advanced Configuration page, as described in Thresholds Enablement.)

The exception is managed systems from the IBM Tivoli® Monitoring domain: Managed systems from the Tivoli Monitoring domain must be monitored with situations that were distributed in your Tivoli Monitoring environment.

To assign groups to the threshold, select the check box of one or more resource groups. If the list of assigned groups is long, you can select check box Show only selected groups.

If you do not see a resource group that you want to assign the threshold to, you can save the threshold definition, and click OK when prompted to confirm that you want to save the threshold without assigning it to a group. You can then create a new group in the Resource Group Manager, and assign a threshold to the new group in the Resource Group Editor. For more information, see Resource Group Manager.

Execute command
After an event is opened for a threshold that evaluates to true, you can have a command or script of commands run automatically. For example, you might want to log information, trigger an audible beep, or stop a job that is overusing resources when an event is opened. The command or script is run on the system of the monitoring agent that opened the event.
The command uses the following syntax:
&{data_set.attribute}
where data_set is the data set name and attribute is the attribute name as shown in the Threshold Editor. If the data set or attribute name contains a space, replace with an underscore. The data_set must be the same data set that you select in the Data set selection field.
The following example shows how you can pass the disk name parameter to your managed resource:
/scripts/clean_logs.sh &{KLZ_Disk.Disk_Name}
You can pass in one or more attributes from the data set. If specified, multiple attributes are passed into the command in order ($1, $2, and so on).

You must ensure the script or programs executed by the command are installed on the agent system since Cloud APM does not provide a mechanism to distribute scripts or programs. The command runs from the command line with the same user account that the agent was started with. Ensure the user that starts the agent has permission to execute the command. For example, if the agent is running as root, then root runs the command on the managed system.

The following options control how often the command is run:
  • Select check box On first event only if the data set returns multiple rows and you want to run the command for only the first event occurrence in the data sample. Clear the check box to run the command for every row that causes an event.
  • Select check box For every consecutive true interval to run the command every time the threshold evaluates to true. Clear the check box to run the command when the threshold is true, but not again until the threshold evaluates to false, followed by another true evaluation in a subsequent interval.
After you click Save, the threshold is applied to all monitored systems of the same data type within the assigned resource groups.
Tip: You can control event behavior and event forwarding through the Event Manager options in the Advanced Configuration page. See Advanced Configuration.
Note: To see a list of the attributes that are suitable for inclusion in the threshold definition, create a table with the data set that you plan to use. .