Event caching

During probe rules processing, the OMi tokens involved in event correlation and deduplication must be present in every iteration.

An OprEvent object has all the tokens required in the rules computation. However, an OprEventChange object never contains all the tokens, and so the probe must request from OMi the full message (OprEvent object) of the original alert for each OprEventChange object. The HTTP operation would amount to substantial overhead if too frequently occur. To minimize the processing overhead, the probe stores the selected fields from the OprEvent object in the Event Cache for future reference to complement each OprEventChange object.

For the probe to manage the Event Cache using the following example configuration, specify the file holding these values in the EventCacheConfig probe property.

Table 1. Example configuration

Field with default value

Description

MAX_NODES=10000

The maximum number of nodes in the cache.

MSG_REQUEST_RERTY=3

The number of retry to request OprEvent for the in-waiting event node.

MSG_WAIT_TIMEOUT=30 (seconds)

The interval of an in-waiting event node.

NODE_DURATION=20 (minutes)

The interval of an OprEvent node to stay in the cache.

RENEW_DURATION=true

To extend node duration if the node ever receives update.

STORED_FIELDS=id, state, severity, priority, application object, key, originating_server, sending_server, time_changed, node_hints, title

The selected fields from OprEvent to be cached.

For an OprEventChange object without a cached OprEvent node for reference, it enters the Event Cache as an in-waiting event node anticipating the full message returned from the probe’s request to the OMi.

There are two possible outcomes to a full message request:
  • If the full message is not obtained within the interval of MSG_WAIT_TIMEOUT, then the node is removed and the OprEventChange is sent to the probe rules as a discarded event as with a $Discarded token.
  • If the full message is obtained within the interval of MSG_WAIT_TIMEOUT, then the in-waiting node becomes an OprEvent node with a full NODE_DURATION.

To maintain data consistency of the OprEvent nodes in the cache, whenever encountering its corresponding OprEventChange or OprEvent, the cached fields common to the object’s fields will get the update. In return, the OprEventChange is complemented with the cached data, and assigned an additional $Complemented token for the probe rules processing.

An OprEvent node will be removed when its stay in the cache has reached the interval of NODE_DURATION. When RENEW_DURATION is on, a node’s duration will be extended with NODE_DURATION from the moment of each update.

When the Event Cache is full, the probe resorts to requesting full messages upon OprEventChange objects without the corresponding cached OprEvent. It is important not to configure NODE_DURATION too large and MAX_NODES too small. The longer a node to stay in the cache, the remaining free space will diminish in a higher rate as there is growth of new nodes.

Event Cache data is not written to or read from a persistent file, the data will disappear after the probe shuts down as there is no practical purpose to keep a backup.

Because event correlation and deduplication are determined by @Identifier and @Type in ObjectServer alerts.status table, which OMi fields to be cached must be consistent with the fields that are used in the probe rules to calculate @Identifier and @Type.