Event search workflow for operators

A typical workflow to show operators how the event search tools can assist triaging and diagnostics from the event list.

Assume the following situation: An event storm has been triggered but the cause of the storm is unclear. For the past hour, large numbers of critical events have been generated. Run the event search tools against the critical events.

  1. To gain an overview of what has happened since the event storm started, select the critical events. Then, right-click and click Event search > Show event dashboard by node > 1 hour before event. The charts that are displayed show how the critical events break down, by node, alert group, severity, and so on.
  2. Check whether any nodes stand out on the charts. If so, close the Operations Analytics - Log Analysis GUI, return to the event list and find an event that originates on that node. For example, type a filter in the text box on the Event Viewer toolbar like the following example that filters on critical events from the mynode node.
    SELECT * from alerts.status where Node = mynode; and Severity = 5;
    After the event list refreshes to show only matching events, select an event, right-click, and click Event search > Search for events by node > 1 hour before event.
  3. In the search results, check whether an event from that node stands out. If so, close the Operations Analytics - Log Analysis GUI, return to the event list, locate the event, for example, by filtering on the summary or serial number:
    SELECT * from alerts.status where Node = mynode; and Summary like “Link Down ( FastEthernet0/13 )”;
    SELECT * from alerts.status where Node = mynode; and Serial = 4586967;
    Action the event.
  4. If nothing stands out that identifies the cause of the event storm, close the Operations Analytics - Log Analysis GUI and return to the event list. Select all the critical events again and click Event search > Show keywords and event count > 1 hour before event.
  5. From the results, look in the Common Patterns area on the navigation pane. Looks for keywords that are non generic but have a high occurrence, for instance hostname or IP addresses.
  6. Refine the search results by clicking relevant keywords to copy them to the Search field and running the search. All events in which the keyword occurs are displayed, and the Common Patterns area is updated.
  7. If an event stands out as the cause of the event storm, close the Operations Analytics - Log Analysis GUI, return to the event list, and action the event. If not, continuously refine the search results by searching against keywords until a likely root cause event stands out.

For possible actions from the Event Viewer see https://www.ibm.com/docs/en/SSSHTQ_8.1.0/webtop/wip/task/web_use_jsel_manageevents.htmlexternal link. For possible actions from the Active Event List, see https://www.ibm.com/docs/en/SSSHTQ_8.1.0/webtop/wip/task/web_use_ael_managingevents.htmlexternal link. Other actions are possible, depending on the tools that are implemented in your environment.