Configuration considerations for EventMonitor service

The IBM® Sterling Control Center Monitor EventMonitor service only runs in the Controller Event Processor (CEP) and is responsible for passing all events created by all Event Processors (EPs) to the SLC (and FileAgent) service.

TIP

When running Control Center Monitor with a single EP, set engine property BYPASS_EVENT_MONITOR_FOR_EVENTS to TRUE. When BYPASS_EVENT_MONITOR_FOR_EVENTS is TRUE, events will go directly to the SLC, and FileAgent, service instead of indirectly via the EventMonitor service, and contention on the IBM Sterling Control Center Monitor EVENTS database table will be decreased.

Gathering Diagnostic statistics

To know how the EventMonitor service is performing look in the Engine log for the metrics it outputs once every hour.

Example prior to Control Center Monitor v6.1.2.1

24 Sep 2018 01:00:03,605 4647011 [EventMonitor] INFO
EventMonitor - None: ****** EventMonitorMetrics (EVENT_MONITOR_EVENT_LIMIT_PER_RUN = 125,
EVENT_MONITOR_EVENT_COUNT_FOR_CATCHUP = 100) ****** 
Current hour (Australia/Sydney): 0 
Notifications broadcast this hour: 3386
Times processEvents() invoked this hour: 719
Times processEvents() invoked this hour without waiting: 0
Times processEvents() invoked this hour without waiting percent: 0
Average time to run processEvents(): 9 ms
Average time to retrieve event data: 2 ms
Average time to populate CachedResultSet: 0 ms
Average time to populate EventCacheManager: 0 ms
Average time to construct notifications: 5 ms
Average time to broadcast notifications: 0 ms
Average time to update LAST_SERIAL_NUM_PROCESSED: 0 ms
Average time to close result set, etc.: 0 ms processEvents() retry
count: 0

Example from ICC 6.1.2.1 iFix 01

07 Jan 2019 11:00:04,290 1637440 [EventMonitor] INFO  EventMonitor -    
None: ****** EventMonitor Metrics (EVENT_MONITOR_EVENT_LIMIT_PER_RUN = 125, 
EVENT_MONITOR_EVENT_COUNT_FOR_CATCHUP = 100)  Hour (America/Chicago): 10 ******
1  EventMonitor processEvents() invoked = 2875
1b EventMonitor processEvents() invoked without waiting (in catchup) = 2599
1c EventMonitor convertXMLStringsToNotificationsAndBroadcastThem() invoked = 5314
2  Events processed by processEvents() = 330655, Max value = 125, Average = 115
3  Notifications constructed and broadcast by convertXMLStringsToNotificationsAndBroadcastThem()
= 330655, Max value = 100, Average = 62
4  Milliseconds performing EventMonitor processEvents() = 231718, Max value = 1552, Average = 80
4a Milliseconds getting DB connection = 145, Max value = 31, Average = 0
4b Milliseconds running SQL to retrieve events = 533, Max value = 141, Average = 0
4c Milliseconds populating CachedRowSet = 3608, Max value = 719, Average = 1
4d Milliseconds populating EventCacheManager = 82, Max value = 16, Average = 0
4e Milliseconds updating CC_CONTROLLER.LAST_EVENT_SERIAL_NUM_PROC = 3703, Max value = 453, Average = 1
4f Milliseconds closing cached result set and database connection = 159, Max value = 64, Average = 0
4g Milliseconds closing query result set = 571, Max value = 156, Average = 0
4h Milliseconds retrieving data from CRS used to constructing XMLStrings = 318, Max value = 47, Average = 0
4i Milliseconds constructing XMLStrings = 54214, Max value = 1219, Average = 0
4j Milliseconds creating SCCNotifications and broadcasting them = 168351, Max value = 597, Average = 31
5  Milliseconds in convertXMLStringsToNotificationsAndBroadcastThem() = 168351, Max value = 597, Average = 31
5a Milliseconds broadcasting SCCNotifications = 943, Max value = 125, Average = 0
5b constructing SCCNotifications = 167408, Max value = 596, Average = 31
6  Milliseconds constructing XMLStrings, converting them to notifications and broadcasting them = 222998, Max value= 1525, Average = 77
7  Milliseconds delta between ACTION_COMPLETED and reconstituting events = 85501, Max value = 4953, Average = 2590
One key metric to look for in the data logged is the number of times processEvents was invoked relative to how many times it was invoked without waiting. When processEvents is invoked without waiting, it is an indication that the EventMonitor service is unable to keep up with the current load of events. And this may, or may not, be related to poor database performance.
Note: When the times processEvents is invoked matches the times processEvents is invoked without waiting it means the EventMonitors service was not keeping up with the number of events generated by the system at any point during the hour prior to the metric being logged.

TIP

If SLCs are incorrectly generating events, use the following two queries to see if the EventMonitor is currently behind in getting events to the SLC service, and by how far:
  • SELECT LAST_EVENT_SERIAL_NUM_PROC FROM CC_CONTROLLER

    The query result returns the serial number value of the event the EventMonitor last processed. It is a unique value assigned by the database as data is inserted into the EVENTS table

  • SELECT MAX (SERIAL_NUM) FROM EVENTS

    The query result returns the serial number of the last event in EVENTS that needs to be processed.

The difference between those two values is approximately how far behind the EventMonitor logic is. The values should be close and within a range of thousand or two.

In v6.1.2, the metric to show the average delta between ACTION_COMPLETED and when an event was reconstituted was added. For example,
7 Milliseconds delta between ACTION_COMPLETED and reconstituting events = 85501,
Max value = 4953, Average = 2590

Since the ACTION_COMPLETED value represents the time in milliseconds an EP completed processing of an event, this metric gives a user an approximation of the delay between when an event was processed by one EP and its delivery to other services, including the SLCService, by the CEP. When this delay is long, and SLC schedule tolerances are small, erroneous SLC events are likely to occur. This maximum value and average should be as small as possible.

Other important metric values are:
  • Average time to retrieve event data
  • Average time to construct notifications
  • Average time to broadcast notifications
  • Milliseconds running SQL to retrieve events
  • Milliseconds broadcasting SCCNotifications
  • constructing SCCNotifications

When the values for Average time to retrieve event data or Milliseconds running SQL to retrieve events are large – queries should take just a few milliseconds, poor database performance is causing a negative impact on the performance of the EventMonitor service.

When the values for Average time to construct notifications or constructing SCCNotifications are large, CPU limitations might have a negative impact on the performance of the EventMonitor service. Note that constructing notifications also requires database I/O. Increasing the value for the engine property EVENT_MONITOR_THREADS, described below, may improve this performance.

When the values for Average time to broadcast notifications or Milliseconds broadcasting SCCNotifications are large – more than a few milliseconds, it could be an indication that Alert 0 actions are in use. Erroneous use of Alert 0 actions can be a major source of performance problems in IBM Sterling Control Center Monitor. See, Rule, Actions and performance issues for more details.