 | Level: Introductory Balan Subramanian (bsubram@us.ibm.com), Staff software engineer, IBM
13 Oct 2004 Learn how the Generic Log Adapter lets you embed class callouts. Using these callouts you can customize the parsing component of the Generic Log Adapter. The article discusses how class callouts work and develops some examples with the correct rules to invoke them.
Introduction
The Generic Log Adapter (GLA) uses a rules-based approach to parse log messages and events from their native format into the Common Base Event format. These rules are typically written using regular expressions, which provide a rich language to specify eye catchers and other patterns. These patterns help to extract pieces of data from unstructured, raw log messages and events. The GLA uses the extracted data to fill in the different fields of the Common Base Event format. However, in some cases, the necessary extraction cannot be performed using regular expressions alone. For example, language and measurements are very difficult to convert using only regular expressions. To handle such situations, the GLA lets you embed class callouts within the rules so that during the evaluation of a rule, a specified Java class can be invoked to determine the result of that rule. This article explains this powerful feature, which allows deep customization of the parsing component of the Generic Log Adapter.
The GLA aids the real time conversion of log messages and events from their product-specific, native formats into the Common Base Event format. Getting messages into the Common Base Events format helps with the adoption of and alignment to IBM® autonomic computing technologies. The GLA uses rules that are built using regular expressions supported by the Java language. With a regular expression, you can specify the pattern to match against the input native log message. When the match is successful, a portion of the matched string
is filled in as a field of the equivalent Common Base Event. Although the regular expression syntax is very powerful, it is sometimes impossible to interpret some messages using regular expressions alone. The following are examples of tasks that cannot be completed using only regular expressions:
- Performing conversions - for example, hostname to IP address
- Database/catalog lookups - for example, determining the name of a component from a registry using a signature
- Pretty formatting - for example, removing/trimming spaces
- Performing actions based on some input - for example, saving and later looking up a property that occurs only at the beginning of the input stream
- Generating random data - for example, a Universally Unique Identifier
- Very complicated extraction - for some extractions, it might be more efficient to use compiled code rather than regular expression rules
Depending on the situation, there can be several examples of messages that cannot be formatted using only regular expressions. To circumvent the shortcomings with the regular expression language, the Generic Log Adapter for Autonomic Computing provides the following possibilities:
- Using a static parser: This involves replacing the entire parser component with a different parser that contains custom code. This is a tedious approach because it replaces the use of regular expression rules with Java code. Using a static parser prevents using regular expressions at all and reduces the value of the Generic Log Adapter infrastructure to just orchestrating code fragments. Note that when you use a static parser, the static parser must handle the extractions and conversions to fill up all fields of the Common Base Event. It is not possible to use rules for some properties and static Java code for others. This approach is not covered further in this article.
- Using built-in functions: For certain Common Base Event fields, the Generic Log Adapter Runtime has built-in functions that can be invoked to evaluate a single rule. This approach is flexible because it allows for some rules to be written using regular expressions, while allowing the use of built-in functions for certain others. It also allows multiple rules to be written for a single Common Base Event property. The only two drawbacks with this approach are: the actual code that is being invoked is predetermined and cannot be changed, and built-in functions are available only for very few properties of the Common Base Event specification.
- Invoking custom Java classes and class callouts: This method is similar to using built-in functions in that it allows the use of regular expressions together with Java code for those situations where regular expressions do not suffice. It also allows a combination of multiple rules to be specified for a single Common Base Event property, some of which might invoke Java methods on the specified classes. This method allows the most customization of the Generic Log Adapter's behavior without requiring replacement of the entire parser. Using this method, you can exploit the features of the Generic Log Adapter (and regular expression rules) to the fullest with room for improvement.
 |
Prerequisites
This article assumes you have installed the Generic Log Adapter and the Log and Trace Analyzer from the IBM Autonomic Computing toolkit (see Resources for a link). It is also assumed that you understand Java code fairly well and have a basic knowledge of plug-in architectures. Some experience writing regular expression rules for the Generic Log Adapter and prior experience trying to perform more complicated tasks with regular expressions is also very helpful. As a starting point, you can read the Parser section in the gla_getting_started.pdf document that is part of the Generic Log Adapter installation.
Using built-in functions
The Generic Log Adapter comes with built-in functions that can be invoked individually to determine the values for certain Common Base Event fields. These built-in functions use predetermined compiled code that is embedded within the Generic Log Adapter Runtime. You cannot customize the behavior of these built-in functions. However, in certain situations, using built-in functions provides an effective way to circumvent regular expression rules. This article looks at some of these built-in functions using an example adapter configuration file you will develop for parsing syslog messages from a Cisco switch. A basic configuration file, starter_cisco_config.adapter, that you can load into the Generic Log Adapter's Rule Builder is available as part of the downloadable code (see Resources). You can build on this configuration file as you work through this article, ending up with the cisco_config.adapter file, which is available as part of the download. The sample log file that you will use to build this configuration file is also part of the downloadable package.
The built-in functions can be used for four different Common Base Event properties and are described in the documentation that accompanies the Generic Log Adapter. This article demonstrates the use and result of using the built-in functions for each of the four Common Base Event properties When you open the starter_cisco_config.adapter file in the rule builder you will find that the rules for the creationTime and the settings for the Extractor have already been set up as shown in Figure 1. You can change the properties under Configuration > Sensor to point to the location where you downloaded the sample log file.
Figure 1. Rule Builder perspective in the Log and Trace Analyzer showing Extractor configuration
Figure 2. Rule Builder perspective in the Log and Trace Analyzer showing rule for creationTime
Now, you'll add some rules for the various Common Base Event properties that are supported by the Generic Log Adapter Runtime for built-in function evaluation. The globalInstanceId field is one that often requires the use of a built-in function because it is very difficult to extract a globally unique ID from the log message or event, and even more difficult to generate one on the fly, using regular expressions. To specify that a built-in function must be used during evaluation of this rule, simply check the Use
built-in function checkbox. Based on which Common Base Event field is being filled by this evaluation, the Generic Log Adapter Runtime invokes the appropriate function and uses the result as the value for that field. Because regular expression evaluation is not involved in the evaluation of this rule, all the fields that are typically filled for a rule (Match, Positions, Substitute, and so on) can be left empty. As you can see, the output shows an internally generated, globally unique ID in the output Common Base Event shown in the Outputter
Result view. Similarly, the localInstanceId field of the Common Base Event can be filled using a built-in function. The generated value is a combination of the IP address of the machine where the Generic Log Adapter is running, the current time in milliseconds, and a hash code of that particular log message's internal object representation.
This provides a sufficiently unique identifier in the local context.
Figure 3. Use of built-in function for globalInstanceId and localInstanceId fields
You can also use built-in functions to fill some fields within the SourceComponentId and ReporterComponentId fields. Specifically, the location and locationType properties can be filled in automatically by the Generic Log Adapter Runtime. Typically, you use built-in functions for these fields when you're trying to write rules for the fields within the ReporterComponentId because in many cases the Generic Log Adapter is the reporter. You can also apply the same rules for the SourceComponentId, but only if the Generic Log Adapter is running on the same machine as the source of the native log data. Note that the values filled in for these fields are based on the actual machine on which the Generic Log Adapter is running. This machine might not necessarily be the same physical machine as that of the data source. If the machine has a valid IP address, that is used for the location field. Based on what was filled in for the location field, the locationType field is filled in appropriately. Also note that when the Use built-in function checkbox is checked,
any values filled in the other fields (Match, Positions, and so on) are ignored.
Figure 4. Use of built-in function for other fields
As mentioned earlier, this technique is very helpful and does not require any effort while creating the rule set. However, the behavior is predetermined and the technique cannot be used for many Common Base Event properties.
Creating class callouts
Class callouts allow for the most customization of the parsing component without having to replace the default parser and without having to lose the flexibility and controlled extraction that comes with using regular expression. Class callouts are specified in the configuration (in the .adapter file) and, therefore, can be plugged in and pulled out without any change in the Generic Log Adapter's code. The most important thing to note is that class callouts can be used in a coordinated fashion together with regular expression rules. Using them in combination lets you take advantage of the speed and strength of compiled code.
There are different ways in which class callouts can be written and used. Class callouts can be written so that a single instance of the callout is used for each evaluation of every rule that specifies that class callout. However, in the simplest cases, a new instance of the class is created for each time the callout class is specified in a rule and for each time the rule is evaluated for each message consumed by the Generic Log Adapter. Either of the above can be accomplished by implementing the correct interface as shown in Table 1. The Generic Log Adapter looks at the interface that the specified class callout inherits from, to determine what behavior to enforce at run time in terms of instance and object creation.
Table 1. Callout interfaces
| Interface name | Behavior | When to use it | | org.eclipse.hyades.logging.adapter.parsers.ISubstitutionExtension | Simplest usage; a new object instance is created for each invocation | When the functionality is repeatable without any history. It can be reused with the same behavior | | org.eclipse.hyades.logging.adapter.parsers.IStatefulSubstitutionExtension | More advanced usage; the same object is reused for every invocation | When memory footprint is an issue, to maintain state between invocations |
In general, to create a class callout you need to create a Java class that inherits from either of the interfaces listed in Table 1. You can find the Java class in the hgla.jar file installed with the Generic Log Adapter. The remainder of this article uses WebSphere® Studio Application Developer (Application Developer) to develop these code fragments. When using Application Developer, create a new Java project and add the hgla.jar file to the build path. Call the class MyCode.
Listing 1. MyCode class
package com.ibm.dw.callouts;
import org.eclipse.hyades.logging.adapter.AdapterException;
import org.eclipse.hyades.logging.adapter.parsers.ISubstitutionExtension;
import org.eclipse.hyades.logging.adapter.parsers.IStatefulSubstitutionExtension;
public class MyCode implements ISubstitutionExtension{
public String processRecord(String arg0) throws AdapterException {
System.out.println("***This is what I got in processRecord: "+arg0);
return null;
}
public String processMatchResult(String arg0) throws AdapterException {
// TODO Auto-generated method stub
System.out.println("***This is what I got in processMatchResult: "+arg0);
return null;
}
}
|
Both the ISubstitutionExtension and IStatefulSubstitutionExtension classes are similar; by having different interface names, the Generic Log Adapter Runtime can determine whether the class callout that implements either of these interfaces must be instantiated once for all
invocations or individually for each invocation.
Methods to be implemented in your callout class
There are two methods in either of the interfaces that must be implemented by your callout. Both methods must be implemented, but the implementation can be empty. The first method is the processRecord method. This takes in a string argument and returns a string result. The method is present so that the Generic Log Adapter Runtime can pass in to this callout class the entire log message that the parser component receives each time. In this case, the run time hands off the entire parsing function to this method in the specified callout class. The callout result is used to fill in the value for the particular Common Base Event property for which the rule contains a callout to the specified class. In this case, the regular expressions syntax is avoided in favor of compiled code. However, you can still use regular expressions for other rules for the same Common Base Event property or for other Common Base Event properties. So, you can use a combination of regular expressions
and code execution to parse the native log messages into the Common Base Event format.
The second method, processMatchResult, let the Generic Log Adapter Runtime first evaluate a regular expression against the incoming log message and passes on the results of that evaluation to this method in the specified class callout. This method couples the strengths of regular expression evaluation and code within a single rule for a particular Common Base Event property. For example, you can use the regular expression to extract a particular keyword or portion of each of the input strings that are sent to the parser and then manipulate the extracted portion with the specified code. As before, you can have multiple rules for the same Common Base Event property, where some rules use a combination of regular expressions and compiled code, some use compiled code, and some use only regular expressions. This flexibility avoids having to replace the entire parser with a static parser.
If the callout method called within a rule for a particular Common Base Event property returns null, the Generic Log Adapter Runtime moves on to the next rule
for that Common Base Event property. If there are no more rules, the value is treated as null for that Common Base Event property. When you write a class callout you must implement both of the methods. At run time, the Generic Log Adapter determines which of those two methods to call based on whether the other fields for that particular rule (Match, Positions, Substitute, and so on) are filled. If they are filled, the processMatchResult method is invoked after the regular expression has been evaluated. If they are left empty for that rule, the processRecord is called immediately, with the entire message passed in as an argument. Further on in this article are examples of these evaluations.
Plugging in and testing class callouts from the Rule Builder
After you have created your callout class, you must create a Java Archive (JAR) file out of it. In Application Developer, you can just export your class file into a JAR file, for example, mycode.jar. To test your class as a callout from your configuration or rule set, you must perform a couple of steps. First, you must get your JAR file into the Rule Builder Eclipse plug-in that forms the core of the Generic Log Adapter Rule Builder. Let's assume that your
Generic Log Adapter is installed under a folder represented by [installFolder]. Under [installFolder]\GLA\dev\eclipse\plugins you will find a folder named org.eclipse.hyades.logging.adapter_1.3.0 that contains the core files used by the Rule Builder. The actual number at the end might be different depending on which version of the Generic Log Adapter you have installed. Open this folder and place your JAR file (mycode.jar) there. Next, tell Eclipse that this plug in will use the new JAR file that was just added into this folder. Edit the plugin.xml using a simple text editor. As shown in Figure 5, under the run-time element you find library element that points to the hgla.jar file. Make a copy of that library and change hgla.jar to the name of your JAR file (mycode.jar). Each time you make changes to plugin.xml you have to restart the Generic Log Adapter Rule Builder.
To use your class callouts at run time (when running the adapter in a stand-alone mode), you must edit the gla.bat file in your Generic Log Adapter's installation folder under the bin subfolder and include the JAR files that hold your callout classes in the classpath.
Figure 5. Edit plugin.xml to include the custom JAR file
Now that you have your code loaded into the Rule Builder plug-in, you can specify the class to be called out from your rules in the Rule Builder. To do this, open your .adapter file in the Rule Builder. Using the cisco_config.adapter that you have been building along the way, you'll add some rules for the component field within the SourceComponentId field. You'll add two rules; the first one performs a class callout and the second provides a result extracted from the input string using a regular expression. In the first rule, you must specify only the Substitution Extension Class field and provide
the fully qualified name (with package name) of the callout class. This is shown in Figure 6 below. For the second rule, you can provide a regular expression in the
Match field and subsequently a value in the Substitute field as shown in Figure 7.
Figure 6. Add a rule that uses a class callout
Figure 7. Add a default rule and results of execution
If you look in the result, you see that only the second rule (the one with the regular expression) was executed. This is because the MyCode returns
a null for either method inside it. Because the Generic Log Adapter received a null result during execution of the first rule (that involved a class callout), it went on to the next rule and used that result as the value for that Common Base Event property. However, if this rule set were run in the Generic Log Adapter Runtime, it would print out messages on the screen showing what arguments were received and which method was invoked. Change the MyCode class file so that it returns different values from the different methods and call the new callout class Capitalizer. For the sake of example, the two methods will return the string they are provided, converted into uppercase, but they will attach different prefixes to the returned string.
Listing 2. Capitalizer callout class
package com.ibm.dw.callouts;
import org.eclipse.hyades.logging.adapter.AdapterException;
import org.eclipse.hyades.logging.adapter.parsers.IStatefulSubstitutionExtension;
import org.eclipse.hyades.logging.adapter.parsers.ISubstitutionExtension;
public class Capitalizer implements IStatefulSubstitutionExtension{
public String processRecord(String arg0) throws AdapterException {
System.out.println("***This is what I got in processRecord: "+arg0);
if(arg0==null) return null;
else return "*** "+arg0.toUpperCase();
}
public String processMatchResult(String arg0) throws AdapterException {
// TODO Auto-generated method stub
System.out.println("***This is what I got in processMatchResult: "+arg0);
if(arg0==null) return null;
else return "$$$ "+arg0.toUpperCase();
}
}
|
After you make these changes, you can either create a new JAR file or update the JAR file within the Rule Builder plug-in folder that you have already added to plugin.xml. If you want to create a new JAR file with a new name, note that you must go through the whole process of placing it in the right Rule Builder plug-in folder and editing the plugin.xml to include the new JAR's file name. In either case you must restart the Rule Builder to use the new classes in the JAR file.
Figure 8 and Figure 9 show how callouts are made under different circumstances.
Figure 8. A rule with callout specified but other fields left blank
Figure 9. Output of GLA Runtime when callout alone is specified
In Figure 8, because the rest of the fields of the rule were left empty (no regular expression was specified), the GLA Runtime just calls the specified class's processRecord method and passes the entire log message to the method. This is shown in the output portion of Figure 8. The entire message was capitalized, a result of the function of either of the methods in the Capitalizer class.
Figure 10. A rule with callout specified along with a regular expression that won't match with any input record
In this case, the GLA Runtime first tries to evaluate the regular expression. Because there is a regular expression specified, the processRecord cannot be called. Also, because the regular expression evaluated to nothing, there is no result to call the processMatchResult method with. In effect, the entire rule is skipped and the GLA Runtime moves on to the next rule, which is based purely on regular expressions. Even if you had a static value in the substitute field, if the regular expression were to fail, the whole rule would be ignored. However, if the substitute field always has some value and the regular expression is never present, the processMatchRecord would be called with the final value, which would be the same as the substitute field. This is shown in Figure 11. As you can see from the $$$ in front of the value inserted for the component field, the processMatchRecord
method was called.
Figure 11. A rule with callout specified along with a static substitute value
In a typical case, where there is a valid regular expression or position that leads to the creation of a valid value, the processMatchRecord would be called with the result of that evaluation as shown in Figure 12. First, the text in the specified position is extracted (based on the parser settings), the regular expression is evaluated against the extracted text, and, using the substitute value, the result is generated and ultimately passed to the callout class's processMatchRecord method. The value returned from the method is used as the value for that Common Base Event property.
Figure 12. A rule with callout specified along with other valid fields
Stateful and stateless callouts
To demonstrate the use of all types of callouts, you'll modify the cisco_config.adapter configuration file you have been building through this article. The file already has rules that call built-in functions and a rule for componentId that demonstrates the usage of a callout modeled after ISubstitutionExtension. However, to bring things closer to reality, you'll create a new callout class that translates from an IP address to a hostname based on a DNS lookup. Because you don't have to keep any state here and creation of multiple instances of the callout class for each invocation is not a problem, you will model this class after the ISubstitutionExtension interface. Call this class HostnameLookup. In the modified rule set, you no longer use the built-in function for the location field within sourceComponentId; typically, this is derived from the actual message because the adapter might be residing on a remote machine. Using the built-in function for the location field is more appropriate within the reporterComponentId. Remember that the built-in function for location returns the IP address of the local machine on which the Generic Log Adapter is running, which might not be the same as the source of the log messages. The intent in using the code in Listing 3 and rule set shown in Figure 13 is to have the output Common Base Event contain host names instead of IP address. If the input message has an IP address, use the class callout to convert it to a hostname based on a DNS lookup; if it already has the hostname, just extract it. Using a similar principle, you will also determine the locationType within the sourceComponentID.
Listing 3. Determine locationType
package com.ibm.dw.callouts;
import org.eclipse.hyades.logging.adapter.AdapterException;
import org.eclipse.hyades.logging.adapter.parsers.ISubstitutionExtension;
import java.net.*;
public class HostnameLookup implements ISubstitutionExtension{
public String processRecord(String arg0) throws AdapterException {
// this class does not provide for processing entire records
return null;
}
public String processMatchResult(String arg0) throws AdapterException {
try{
InetAddress address = InetAddress.getByName(arg0);
return address.getHostName();
}catch(Exception e){
}
return null;
}
}
|
Figure 13. Rule for location with callout specified for converting IP address to hostname
However, certain methods in your callout might not be executed properly from within the rule builder even though they are executed by the Generic Log Adapter Runtime when
run in stand-alone mode. The reason is that certain methods involve security checks that fail when executed within the context of an Eclipse plug-in. For example, if instead of
address.getHostName you had used address.getCanonicalHostName in the callout, the callout does not execute properly when the rule set is executed from the Rule Builder; you do not see the proper output of the callout in the Outputter Result view and the Rule Builder appears to hang. However, note that all Java classes and methods can be used within your callout, and they will execute properly when run in stand-alone mode using the Generic Log Adapter Runtime
Finally, to demonstrate the use of IStatefulSubstitutionExtension create a callout class that keeps an incremental counter that is used to assign the
sequence numbers for the Common Base Events generated for log messages. Call this class SequenceGenerator. As you can see from Figures 14 and 15, for subsequent records the same instance of SequenceGenerator callout class is used for resolving callouts and therefore, the sequence numbers change accordingly. See the comments in Listing 4 for more on how the methods can be used to accommodate different scenarios.
Listing 4. Use methods to accommodate different scenarios
package com.ibm.dw.callouts;
import org.eclipse.hyades.logging.adapter.AdapterException;
import org.eclipse.hyades.logging.adapter.parsers.IStatefulSubstitutionExtension;
import java.net.*;
public class SequenceGenerator implements IStatefulSubstitutionExtension{
private int counter = 100;
private String lastSeenArg = null;
public String processRecord(String arg0) throws AdapterException {
// produce a new sequence number if a record is passed in as it is
String retValue = null;
counter++;
retValue = Integer.toString(counter);
return retValue;
}
public String processMatchResult(String arg0) throws AdapterException {
// if an argument is passed in make sure it is not the same as what was
// passed in the previous call
// if the argument is different, update sequence number and return it
// if not, return current sequence number because this means we are
// looking at the same record we saw last
// it is up to the rule designer to make sure they extract an unique portion
// out of the message and pass it to this method
// through the callout and make sure null would never be passed in
if(lastSeenArg==null || !lastSeenArg.equals(arg0)){
// this is the first time we have been called or last argument was
// different from this one
counter++;
lastSeenArg = arg0;
return Integer.toString(counter);
}
else{
return Integer.toString(counter);
}
}
}
|
First, consider the use of the processRecord method of the above callout. As you can see below, the sequence number increases gradually no matter what message is being processed. So, duplicate messages are assigned different sequence numbers.
Figure 14. Rule for sequence number with callout specified leaving other fields empty
Figure 15. Rule for sequence number with callout specified leaving other fields empty and sequence number being incremented
Now, assume that you want sequence numbers to increment only when the timestamps are different at least by a second. In other words, you want to ignore the millisecond when assigning sequence number. This can be accomplished by extracting only the portion of the timestamp without the millisecond in each message and passing that to the callout. As described in the comments in Listing 4, the processMatchResult method assigns a new sequence number only if a new argument has been passed in. This won't be the case when subsequent messages have timestamps that differ only in milliseconds and therefore, the resultant Common Base Events have the same sequence numbers as shown in Figures 16 and 17. Figure 18, however, shows how it changes as soon as the timestamps change more than in just milliseconds.
Figure 16. Rule for sequence number with callout specified together with timestamp extraction
Figure 17. Rule for sequence number with callout specified leaving other fields empty and sequence number staying same
Figure 18. Rule for sequence number with callout specified leaving other fields empty and sequence number changing
Conclusion
In this article, you were introduced to two powerful features that the Generic Log Adapter provides to ease the creation of rules that can overcome some issues with regular expressions.
These features also allow for infinite customization of the behavior of the parsing component without having to replace it entirely. The possibilities are endless; you can save records to
disk, look up DNS servers or databases, translate mnemonics into messages or messages into different languages, keep counters, and so on. Class callouts allow the Generic Log Adapter to interact with other applications and components, thereby extending the scope of what the Generic Log Adapter can do. With this knowledge in hand, you should now be able to use class callouts to overcome any difficulties and use the Generic Log Adapter in a truly universal fashion.
Download | Description | Name | Size | Download method |
|---|
| code files | ac-calloutsource.zip | 6 KB | HTTP |
|---|
Resources
- Download the source code file that includes the source code for all the callout classes, configuration files, and the sample log file used in this article.
- The developerWorks Autonomic computing zone provides more information on the AC Toolkit components (Generic Log Adapter, Log and Trace Analyzer) used in this article.
- Browse for books on these and other technical topics.
About the author  | 
|  | Balan Subramanian enjoys working as a Staff Software Engineer in the Autonomic Computing group at IBM in Research Triangle Park, North Carolina, focusing on data collection, problem determination, and provisioning. His other interests include Web services, grid services, and pervasive computing. A Sun Certified Java Programmer, Balan received his master's degree in computer science from George Mason University in 2000 with a thesis on Web services performance. He was also a core developer on the IBM Generic Log Adapter for Autonomic Computing and as a development co-op on the AUIML toolkit. He has previously worked at IBM India. He can reached at bsubram@us.ibm.com.
|
Rate this page
|  |