Today's diverse interconnected e-business components typically come with a lot of event information generated by touchpoints through log files or event emitters. Correlating event information to derive symptoms, or higher level business conclusions, is fundamental to identifying critical situations that need to be corrected. This article describes the IBM Active Correlation Technology (ACT), which provides built-in patterns that support event correlation and complex event processing.
ACT is a technology that is in the works at IBM. You will see it showing up in our products in the future. At this point, however, ACT is not available to be embedded into your own applications. However, if you understand the benefits that this new technology provides, you'll be better able to understand the direction in which autonomic computing technology is headed. Read this article for a sneak peek at what types of functions you'll be seeing in the future. As always, we like hearing what you think; chime in with your thoughts on the autonomic computing discussion forum in the Resources section of the article.
The article provides a brief overview of ACT, which is a set of modular event correlation components that deliver complex event processing functions, such as:
- Aggregating and filtering events
- Correlating and associating events for problem determination and detection of business situations
- Triggering automatic actions in response to events that cause situations
- Associating events with business information
ACT includes support for events that conform to the Common Base Event specification and other messaging formats. ACT is a technology that is being embedded in different IBM products and offerings.
Any customer with a data center, trying to manage a complex IT infrastructure, can benefit from a solution or product that embeds ACT. By using ACT to detect symptoms, customers can:
- Reduce the number of events their operation staff needs to handle by filtering out spurious events, removing duplicates, and summarizing a collection of events
- Correlate lower-level events into a meaningful diagnosed symptom that provides higher level or better information for problem determination
- Gain the ability to take autonomic actions and solve the original problem using corrective actions.
ACT is a software development kit that includes code libraries, APIs, plug-ins, and documentation to help you embed correlation technology into your application solutions. ACT is not a product. It provides cross domain correlation through a run-time environment, and tools, to help develop and execute rules for correlating and filtering events across many different environments.
ACT can provide complex event processing to derive high-level, or complex, events from the analysis, correlation, and summarization of low-level events in event-driven systems. The complex events are suitable for notifying people of business opportunities, or problems, in easy-to-understand terms, or for triggering automated processes.
Figure 1 shows the overall architecture of ACT.
Figure 1. ACT architecture
You can create correlation rules, specified in terms of the supported patterns, by using the ACT rule builder tool. The rules are then loaded into the ACT engine through the ACT run-time environment. As incoming events arrive, at different times, each event is matched against the patterns and one or more rules are triggered. When a rule is triggered (such as timeout, receipt of an event, and so on), a response occurs that includes the execution of actions.
The ACT subcomponents are described below.
- Rule builder
- A graphical user interface (GUI) that lets you write correlation rules in the ACT rule language. Figure 2 shows the ACT rule builder, which lets you easily define a rule set consisting of rule blocks and rules.
The input to the ACT rule builder includes event definition information that's used to select events to be processed by the ACT rules. The ACT rule builder also lets you incorporate snippets of code and specific actions into rules.
The output of the ACT rule builder is a rule set, in an XML document, which defines the rules that are the basis for event correlation. Within the rule definitions in the XML document, actions are defined to indicate what is to be done as a result of correlation activity.
Figure 2. Rule builder overview
- Rule language
- Is XML-based, and lets users specify rules based on common correlation patterns. Rules created using the ACT rule language can be deployed to ACT run-time environments.
- Run-time environment
- The subcomponent or application that embeds the ACT engine and the ACT compiler. It enables the ACT component to work properly in an application solution. Different applications use ACT in different environments, and listen and receive events in different circumstances. For example, a solution may drive the ACT engine with events from queues, event logs, or a Java™ Message Service (JMS) subscription. The ACT run-time environment typically includes the ACT engine and the ACT compiler.
- Provides the core service of receiving events and processing them against loaded rules or correlation patterns. The engine is embedded as part of an application (run-time environment). More than one instance of an ACT engine can be embedded and controlled independently. The ACT engine depends, indirectly, on the ACT compiler, and supports all rule patterns defined in the ACT rule language.
As shown in Figure 3, within the ACT engine the correlation rules determine the specific event patterns to be detected, and the time during which to look for the event patterns. As events are processed, they are selected to participate in rules according to selection criteria. This is represented in the events selector block of a rule. The events fill in the patterns specified in the pattern condition. When a pattern is completed the rule is triggered, and actions are taken in response.
ACT provides basic actions, and you can also include customized actions.
Figure 3. ACT engine
- Compiles ACT language files into a data structure that can be understood by the ACT engine. The compiler usually resides in the same run-time environment as the ACT engine, but it can also reside in a different environment or system, where its serialized output can be transferred to the system on which the engine runs.
ACT rule patterns
ACT provides built-in patterns that support event correlation and complex event processing. ACT supports:
- The correlation of events in both stateless and stateful (or temporal) modes
- The specification of rules that permit input events that sequentially arrive at different points in time to be correlated according to well-defined patterns
- A conclusion with one or more responses to be generated based on the outcome of the correlation
The simplest type of correlation is a stateless rule, also called a filter or match, where a single event that passes a filter condition immediately generates a response. On the other hand, the stateful rules provide correlation across multiple events occurring within a given time interval.
The set of base correlation patterns supported by ACT have been proven, by experience over the years, to cover most of the event correlation problems that IBM customers need to address. These are the built-in capabilities that must be available in the event correlation environment, and that provide the basic building blocks on top of which more complex correlation can be created (for instance, by using composite rules). The base patterns are designed for simplicity and abstraction to the user, but also for high performance processing capability at run time.
The filter pattern checks each event to determine if it matches an event selector. If a match is found (the expression is true), an action may be taken as specified in the rule. Actions taken might include filtering events in and out of the event input stream.
Figure 4 shows a flow of events that occur during a time period. The event selector box indicates the event type that triggers the filter rule. When the rule is triggered by the event, the
onDetection response is executed.
Figure 4. Filter pattern
For example, you could use the filter pattern for a rule that pages an Administrator if a
ServerStatus event indicates a
serverLoad greater than 95%.
The collection pattern is an example of stateful correlation. In this pattern, events are collected over a time period. At the end of the time period, the events are available for use within the
onTimeWindowComplete response action.
Figure 5. Collection pattern
A common use for this pattern is to collect events matching a specific event selector and to summarize them into a single event containing the total count of events, including characteristic information about the events summarized.
The duplicate pattern is a special form of the collection pattern that's used to detect duplicate events. The first event received is processed by the engine in a normal fashion with the detection response; it is saved, and passed to the other rules. All subsequent matching events that occur during a specific time window are processed by the
onNextEvent response. They are not saved in the rule, and they only increment the duplicate count. Then an implicit exit rule set is performed, causing the events not to be processed further. Further actions may be taken at the end of the time window in the
The computation pattern is another specialized form of the collection pattern. In this pattern, a computation function is executed every time an event that matches the event selector is received. For example, if ACT is processing customer order events, each time an event is received the total value of the order is added to the total value of all orders that occurred during the time window specified. Actions may be taken at the end of the time window in the
The threshold pattern is a stateful pattern. As events are received, a threshold is evaluated based either on event count or on a computation across all collected events. The threshold is evaluated in a time window, which can be fixed or sliding.
Figure 6. Threshold pattern
An example of using the threshold rule is to execute an action to check the status of a router if more than four
"server unreachable" events from a subnetwork happen in a sliding window of 30 seconds.
The sequence pattern is used to detect the presence or absence of an ordered or unordered sequence of events within a period of time.
Figure 7. Sequence pattern
Sequence detection occurs when trying to act on the root cause of a set of events. For example, in an IT environment an administrator would want to reset the DB2® heapsize if both the WebSphere Application Server Resource Allocation Exception and the DB2 ERROR SQL0954C "Not enough heap to process statement" were encountered. A rule is written to watch for those two events. When they are encountered within the specified time interval, an action is executed that increases the DB2 heapsize and restarts the database manager.
To understand the process for identifying a treatable symptom, or cause of a problem, and its flow in the Monitor, Analyze, Process, Execute (MAPE) loop, let's use another example. The following events are emitted by the WebSphere Application Server (Application Server) and a Cisco router. An application running inside an Application Server instance uses a table stored in DB2. The communication between the application on Application Server and the DB2 server is through a Cisco router. At some point during operation, the link between the router and the DB2 server goes down. After that, the application on Application Server tries to communicate with DB2 and gets a failure. The symptoms identified below 1 are:
|WAS_CONNECT_CAPTURE||Identifies that the Application Server application cannot connect to the DB2 server|
|CISCO_AVAIL_CAPTURE||Identifies that a link on the router is unavailable|
|WAS_CISCO_CAPTURE||Identifies that both of the previous symptoms have occurred (our sequence rule)|
|FIX_SUCCESS_CAPTURE||Identifies that resolution has occurred successfully|
The timer pattern provides a simple timer that goes off at the end of the
timeWindow. The timer always repeats unless the repeat attribute is set to false using the
onTimeWindowComplete response. The timer pattern allows for a response when the
timeWindow completes (
onTimeWindowComplete response). The timer pattern can be used to implement cleanup rules. For example, every 30 minutes, execute an action that cleans up harmless and informational events that have been open longer than 48 hours.
This article provided an architectural overview of the IBM Active Correlation Technology and each of its components. We described the built-in correlation patterns, which cover most of the event correlation problems our customers need to address. Some examples showed how customers could typically use ACT for symptoms detection.
Expect this component to become a key integrating technology for applications and solutions that need to correlate Common Base Events and other event formats.
- Chime in with your thoughts by participating in the discussion forum, Autonomic Computing: an insiderâs perspective.
- "Symptoms deep dive, Part 1: The autonomic computing symptoms format" (developerWorks, 2005): This article provides an overview of the autonomic computing symptoms format and how it fits in the autonomic computing architecture.
- Common Base Event Documentation: Read the documentation for more information on Common Base Events.