Forcing checkpointing after a specified number of minutes

There are cases when a network is down, or communication between servers is not good. In such circumstances, Impact may lose a block of events that were sent to the secondary server for processing. If this happens, then Impact will be holding the checkpoint of events and the events themselves in memory, which may cause an OutOfMemory error.

Checkpoint means to persist the Serial or Statechange field of events to the etc/eventreader.state file, so that an event reader knows whether or not it has handled a block of events.

For example, in the case where processing for a block is slow, you may see the following messages in the logs:

INFO [EventBroker] AbstractEventReader: checkPoint: The Block ID = 248262 is not the one I was expecting: 248260

INFO [EventBroker] Hold the events with identifier :248262 until earlier block of events are processed

In this case event block 248260 has not reported back to the primary cluster member that its processing is complete. It may still be being processed, or the confirmation may have been lost, possibly due to network issues. The primary cluster member holds all events after this event block in memory, which may cause an OutOfMemory error.

To avoid this problem, you can set the maxminutestoforcecheckpoint property in the OMNIbus event reader properties file: $IMPACT_HOME/etc/<servername>_<omnibuseventreadername>.props.

For example, add the following property:

impact.<omnibuseventreadername>.maxminutestoforcecheckpoint=5

This forces checkpointing to occur after the specified number of minutes. Impact server can then continue processing and checkpointing events.

There are two possible reasons for missing checkpoints:

  1. Missing checkpoint when events are processed successfully: There may be an exception such as NullPointerException thrown when checkpoint the block in eventreader, so it is not missing from processing, just missing for checkpoint.
  2. Missing checkpoint when event processing fails or times out: You can find out the exception in the event processor logs and impactserver.log for these events.