The Autonomic computing content area on developerWorks provides a variety of components you can use for problem determination. Two of these components include IBM ABLE (Agent Building and Learning Environment), a Java framework and component library for building intelligent agents that uses machine learning and reasoning, and the Autonomic Management Engine (AME), which is an example implementation of an autonomic manager. Our sample solution uses the AME and ABLE to demonstrate one way you can perform root-cause analysis. The ABLE Rule Language, combined with the wide range of inference engines (part of the ABLE toolkit), is a powerful tool for performing complex analysis. The AME provides excellent event monitoring, filtering, and aggregation capabilities. It can also perform analysis and execute scripts to make changes to the system state. This article describes how you can combine the capabilities of ABLE components with the AME to perform event correlation and problem determination in an IT system.
This article assumes you have installed the Resource Model Builder described in the Autonomic Computing toolkit, the AME for Autonomic Computing, and the ABLE 2.1 toolkit. It also assumes you know how to build and deploy resource models. You must also install the PD Scenario from the Autonomic Computing toolkit to obtain the CanonicalSituationMonitor.zip file contained in the scenario. While installing the PD scenario, select the Do not check sites option. More specific instructions are given in the sections Set up the development environment and Install the ABLE plug-in. You can also refer to the developerWorks articles titled, "Create a simple resource model for processing common base events from a file" and "Call Java classes from AME," which show you how to do these tasks. Links to these articles are available in the Resources section.
The ABLE framework provides a library of AbleBeans that uses the JavaBeans design. These beans give a wide variety of capabilities, ranging from data import and data transformation to inferencing and machine learning. You can use them for classification, predication, categorization, configuration, performance tuning, problem determination, event correlation, resource allocation, and planning. Their design allows them to be used synchronously or asynchronously (on a separate thread), as well as distributed across different Java virtual machines on the same or different physical computers. AbleBeans can also be extended to add custom functionality, allowing users to create intelligent components with arbitrary complexity.
For reasoning, ABLE provides the AbleRuleSet bean and the ABLE Rule Language (ARL), which is patterned after the Java language. We will express business logic in rules written in ARL. Rule blocks contain the rules, which are analogous to Java methods. Rule blocks, variable definitions, and import statements written in ARL are parsed from a text rule set file into Java objects by the AbleRuleSet bean. Some rule blocks, such as init and process, have special purposes in the same way that the constructor and main method have a unique purpose in the Java language. Each rule block can use any of the provided inferencing engines -- including forward chaining, backward chaining, fuzzy logic, pattern matching, script, policy, and planning. Custom inference engines can also be plugged in to provide custom logic. Arbitrary Java classes can be imported into rule sets just like classes can be imported into Java classes, completing the infrastructure for complex analysis. For more information on ABLE, see the Resources section at the end of this article.
In the example described in this article, we will use the PatternMatch and Script inference engines to perform analysis.
A Web application deployed on IBM WebSphere® Application Server Version 5.0 is connected to a remote IBM DB2® 8.1 database using a switch. Suddenly, the Web application becomes unavailable. The problem can be caused by several issues, such as the application or WebSphere Application Server is brought down, the switch link connecting WebSphere Application Server to DB2 is brought down, or the database server itself goes down. For our problem-determination scenario, we selected the following two root causes: the link connecting WebSphere Application Server to DB2 goes down and the DB2 server goes down. See Figure 1.
Figure 1. A simple Web application

In the real world, the sudden unavailability of a Web application can mean catastrophic loss of revenue. In the absence of autonomic technologies, determining the root cause of a sudden application failure can be very time consuming. Typically, it involves sending the log files of all related components (WebSphere Application Server, DB2, and the switch in this case) to tech support. The support staff must manually correlate the error messages across the various log files to identify the root cause, and then run a sequence of steps to correct the problem.
Create a solution using autonomic computing technologies
Figure 2. An autonomic system using the AME and ABLE

In this article, you'll see how an autonomic system reacts in a similar situation and how it can detect the root cause in a few seconds. All the functions can be performed in the AME component, but to demonstrate the flexibility of the architecture, we set up this scenario so that most of the function is performed inside the AME. We use ABLE to demonstrate how to perform more advanced function, such as decision-tree support. The AME (with the help of the Generic Log Adapter) will monitor WebSphere Application Server, DB2, and switch logs. We will also develop an ABLE rule set that uses decision trees to support additional correlation of these three logs to determine which of the two previously mentioned root causes has made the application inaccessible. After the root cause is identified, we'll correct the problem by executing a predefined script.
To simplify the AME and ABLE integration process, we assume the file CBEout.log, monitored by the resource model, contains all the CommonBaseEvents required for analysis. We also assume that the Generic Log Adapters monitoring the WebSphere Application Server activity.log, the DB2 server’s db2diag.log, and the switch log convert the native messages into CommonBaseEvent and write it to the CBEout.log file. Note: You can download the files described in this section by selecting the "Code" icon located at the top of this article or selecting the text link in the Resources area.
Set up the development environment
We will use IBM WebSphere Studio Application Developer as our development environment. ABLE requires WebSphere Studio Application Developer Version 5.1 or Eclipse 2.1.
Download the file com.ibm.able.bin_2.1.0_bin.zip from IBM alphaWorks. To install the plug-in from the file you downloaded, unzip it into your \eclipse directory (located in WebSphere Studio Application Developer installation dir) and start the application developer.
Note: You can download the files described in this section by selecting the "Code" icon located at the top of this article or selecting the text link in the Resources area.
- Next, create a new Java project and name it ameable.
- Add hlcore.jar and hlevents.jar to the Java build path.
- Import ABLEBridge.java, AbleWrapperService.java, MyProperties.java, and ablerules.arl to the project.
Double click the ablerules.arl file to open it into the ABLE editor.
Figure 3. Viewing ABLE rule set in Eclipse

For more information regarding the structure and syntax of the rule set, please refer to the ABLE documentation. (See the Resources section at the end of this article.)
We use a combination of the Pattern Match and Script inference engines to perform our analysis.
Algorithm
- The input to the rule set is an array of CommonBaseEvent Objects of arbitrary length.
- Use the Script engine to iterate through all the input CommonBaseEvent Objects and assert them into working memory in the preprocess rule block. Consider the working memory to be a Java collection of arbitrary classes.
- Use the PatternMatch engine in the process rule block to search working memory to obtain references to objects of specific classes with specific attribute values. When a pattern matches, the pattern match rule is fired, which means the "do" portion of the rule is processed with the references bound to local variables.
- Depending on which CommonBaseEvent objects are asserted (WebSphere Application Server and DB2 or WebSphere Application Server and switch), either the linkdownrule or the db2downrule rule sets are invoked.
- If linkdownrule rule set is fired, copy the CommonBaseEvent objects to global variables and remove them from the working memory. Next, invoke the SwitchWasProblem rule set for further investigation. Go to step 5.
- If the db2downrule rule set is fired, copy the CommonBaseEvent objects to global variables and remove them from the working memory. Next, invoke the DB2WasProblem rule set for further investigation. Go to step 6.
- If none of the rule sets gets fired, then perform postprocess and exit.
- In the SwitchWasProblem, if the Situation category of the server-related CommonBaseEvent is a ConnectSituation and the switch-related CommonBaseEvent is an AvailableSituation, this implies that the root cause of the problem is that the switch link is down. This makes the database server inaccessible to the Web application and causes it to fail. Set the output variable to a unique value representing this root cause. The process rule block terminates, and ABLE invokes the postprocess rule block before ending rule set processing.
- In the DB2WasProblem, if the Situation category of the application server-related CommonBaseEvent is a ConnectSituation and the database-related CommonBaseEvent is a StopSituation, it implies that the root cause of the problem is that the database server is down. This causes the Web application to fail. Set the output variable to a unique value representing this root cause, perform postprocess, and exit.
Write a Java wrapper to the ABLE rule set
The AbleWrapperService.java class shows how you can load an ABLE rule set, pass it input data, and execute it to obtain the results of analysis.
The following classes need to be imported:
import com.ibm.able.AbleException; import com.ibm.able.rules.ARL; import com.ibm.able.rules.AbleRuleSet; import com.ibm.able.rules.AbleRule; |
Loading the rule set
try
{
ruleSet = new AbleRuleSet();
ruleSet.parseFromARL(rules);
ruleSet.init();
}
catch (AbleException exp) {}
|
Performing analysis using the rule set
try
{
// Need to wrap the array into an array of Objects
Object [] wrapperArray = new Object[1];
wrapperArray[0] = cbeArray;
// pass the array of objects to the process method
// for analysis
String output = (Object[]) ruleSet.process(wrapperArray);
}
catch (AbleException exp) { }
|
The events occurring in the system are transformed into CommonBaseEvents using the Generic Log Adapter. These events are then passed as an array of CommonBaseEvent objects to the analysis engine by AME to determine the root cause. The contents of the input{ } section of the AbleRuleSet maps to the contents of the wrapperArray, which is passed to the process method. As a result, we add the given array of CommonBaseEvent objects to a wrapper array of type Object. The output is an object array in which the first element contains the results of analysis. The ABLEBridge class acts as a bridge between the AME and the AbleWrapperService. It hides details of the actual analysis and provides a simple interface that can be invoked from the AME.
To create a basic resource model, follow the steps described in the article, "Create a simple resource model for processing common base events from a file" (see Resources).
We need to add the ABLE-related JAR files along with the mohawkable.jar (containing the Java wrapper classes) to the AME classpath. Go to \sara (located in the AME install directory), edit the sara.bat file, and add the following line:
SET CLASSPATH="%CLASSPATH%;c:\able2.1.0\able_2.1.0\lib\hlcore.jar; c:\able2.1.0\able_2.1.0\lib\hlevents.jar;c:\developerworks\mohawkable.jar; c:\able2.1.0\able_2.1.0\lib\able.jar;c:\able2.1.0\able_2.1.0\lib\ablebeans.jar; c:\able2.1.0\able_2.1.0\lib\ablerules.jar;c:\able2.1.0\able_2.1.0\lib\antlr.jar; |
Next, open the Resource Model Builder and click the source icon.
Figure 4. Editing the source of our resource model

To import our Java classes and instantiate the AbleBridge class, add // GLOBAL CONSTANTS to the code shown here:
importPackage(Packages.com.ibm.developerworks); var sB = new AbleBridge(); |
Scroll down to the VisitTree function and below this line:
Svc.Trace(TRACE_FINEST, TRACE_SOURCE + "===> CBE = " + curCommonSituationsEventsituation); |
Insert the following:
var soln = sB.queryABLERS(curCommonSituationsEventsituation); Svc.Trace(TRACE_FINEST, TRACE_SOURCE + "===> soln = " + soln); |
This step invokes the SymptomsBridge class and passes it the newly read CommonBaseEvent xml string, which will, in turn, invoke the AbleWrapperService, which then queries the ABLE rule set. The result of analysis is returned and assigned to the soln. Save your changes by clicking on the floppy disk icon.
To export the resource model and deploy it in the AME, follow the steps described in the developerWorks article, "Call Java classes from AME" (see Resources). Rename the downloaded CBEout.log as CBEout_original.log. Create another file c:\CBEout.log. Copy the first two events in the CBEout_original.log into CBEout.log. Go to \sara directory (located in the AME install directory) and run sara.bat. Next, type startrme. The CBEout.log contains WebSphere Application Server and DB2 system-related CommonBaseEvents (CBEs). The output should look like this:
Figure 5. The root cause identified as the DB2 server being down

The ABLE components correlate the two CBEs, pass it to one related WebSphere Application Server that fails to connect to the database and the other that indicates the DB2 server is down. As a result of the correlation, the root cause of the problem is determined to be that the database server is down.
Next, copy the first and third event from CBEout_original.log into CBEout.log. The CBEout.log now contains WebSphere Application Server and switch-related CBEs. The output should look like:
Figure 6.The root cause identified as a switch link connecting the Web server to the database is down

In this article, you learned how to use AME and ABLE to create a self-diagnosing autonomic system. A Web application deployed on WebSphere Application Server is connected to a remote database (DB2) through a switch. The AME, with the help of the Generic Log Adapters, is monitoring the logs of these three components. When an error occurs, the ABLE-based analysis engine -- through the use of decision trees -- correlates the events generated by the monitored components and determines the root cause of the problem. The diagnosis occurs almost instantaneously, which in a real-life scenario is invaluable.
| Name | Size | Download method |
|---|---|---|
| ac-ablesource.zip | HTTP |
Information about download methods
- Download the source code used in this article.
- Download ABLE from alphaWorks. You were introduced to a part of ABLE in this article, but you might want to download the entire technology to further explore its features, which were updated recently.
- To create a basic resource model, follow the steps described in the article, "Create a simple resource model for processing common base events from a file" (developerWorks, June 2004).
- The developerWorks article "
Call Java classes from AME" shows how to export the resource model and deploy it in the AME (developerworks, June 2004).
- "The problem determination log/trace scenario guide" contains more information on the problem-determination scenario and how it functions.
- Refer to "Create a simple resource model for processing common base events from a file" for more information (developerWorks, June 2004).
- Browse for books on these and other technical topics.

Neeraj Joshi works as a staff software engineer in the Autonomic Computing Division of IBM. He has a master's degree in computer science from North Carolina State University. He can be reached at jneeraj@us.ibm.com.

Jeff Pilgrim is an advisory software engineer assigned to the ABLE research team at Rochester IBM eServer Software Services in Minnesota. His previous development experience includes work on Intelligent Miner for Data, Neural Network Utility, wide area wireless computing, and systems management. He was a developer and architect for several system configurators, as well as for numerous internal industrial engineering applications. Jeff joined IBM in 1979 at IBM Owego, where he was responsible for forecasting workload for defense contracts. He was awarded a Master of Science degree in Industrial Engineering and Operations Research in 1980 from Pennsylvania State University. He can be reached at pilgrim@us.ibm.com.

Balan Subramanian works as a Staff Software Engineer in the Autonomic Computing group at IBM in Research Triangle Park, North Carolina, focusing on data collection, problem determination, and provisioning. His other interests include Web services, grid services, and pervasive computing. A Sun Certified Java Programmer, Balan received his Master's Degree in Computer Science from George Mason University in 2000 with a thesis on Web services performance. He was also a core developer on the IBM Generic Log Adapter for Autonomic Computing and as a development co-op on the AUIML toolkit. He has previously worked at IBM India. He can be reached at bsubram@us.ibm.com.

Brad Topol is a Senior Software Engineer with the Software Group Advanced Design and Technology team at IBM in Research Triangle Park, North Carolina. He received a Ph.D. degree in Computer Science from the Georgia Institute of Technology in 1998. Currently, he is the development lead for the Automated Problem Determination Serviceability Tool. Over the years, Brad has been actively involved in advanced technology projects in the areas of autonomic computing, Web services, grid computing, Web content transformation, and aspect-oriented programming.