Eclipse Test & Performance Tools Platform, Part 2: Monitor applications

Collect and analyze a variety of log files

In this "Eclipse Test & Performance Tools Platform" tutorial series, learn how to use the capabilities of the Eclipse Test & Performance Tools Platform (TPTP) to convert application log files into a structured format. Then, using TPTP and other specialized tools designed to process and analyze log files, you can quickly discern usage patterns, performance profiles, and errors.

Martin Streicher (martin.streicher@linux-mag.com), Editor in Chief, Linux Magazine

Martin Streicher is the Editor-in-Chief of Linux Magazine. Martin earned a Master of Science in Computer Science from Purdue University and has been programming UNIX-like systems since 1986 in the Pascal, C, Perl, Java, and (most recently) Ruby programming languages.



25 April 2006

Also available in

Before you start

About this series

Writing code for an application is the first stage in the long process required to deliver robust production-quality programs. Code must be tested to vet its operation and accuracy. Code must often be profiled to remove bottlenecks that impede performance and to remove wasteful or inadvertent use of resources, especially memory. Code must also be monitored -- to pinpoint failures, of course, but also to identify usage patterns, opportunities for further enhancement and optimization, and attempted and actual intrusions.

The Eclipse Test & Performance Tools Platform (TPTP) is a software architecture and several realized components (so far) that extend the Eclipse platform to include test, performance, and monitoring tools. This "Eclipse Test & Performance Tools Platform" series explores the capabilities of TPTP. Part 1 demonstrates how to profile a Java™ technology application. Part 2 demonstrates how to capture and transform arbitrary log files to the widely supported Common Base Events (CBE) format. Part 3 explains how to manage application testing.

About this tutorial

This tutorial shows how to use the capabilities of the Eclipse TPTP to convert a typical application log file into CBE occurrences. With a modicum of specifications and some light coding to create a series of rules, you can transform virtually any log file into a unified, structured format. Then, using the Eclipse TPTP and other specialized tools, you can combine, process, and quickly discern usage patterns, performance profiles, and errors.

Objectives

In this tutorial, you learn how to write an adapter to transform a typical Linux® software service log file into CBE data. You create the transform incrementally and debug the transform with the Eclipse TPTP Adapter Configuration Editor, then the Generic Log Adapter (GLA) to input, transform, and emit the data.

Prerequisites

You should have experience with software development and the entire software development life cycle, including testing and profiling. You should also have experience installing software from the command line, and setting and managing shell and system environment variables, such as the shell's PATH variable and the Java CLASSPATH. Additionally, it's vital that you have some experience reading and writing regular expressions. Acquaintance with Eclipse and the Eclipse user interface (UI) paradigms is also beneficial.

System requirements

You can run Eclipse on any system that has a JVM, such as Solaris, Linux, Mac OS X, or Windows. If you don't have a JVM and Eclipse installed on your system, make sure you have at least 300 MB of disk space free for all the software. You also need enough free physical memory to run the JVM. In general, 64 MB or more of free physical memory is recommended.

You must install several software packages on your UNIX®, Linux, Mac OS X, or Microsoft® Windows® system. You need a functioning Java Virtual Machine (JVM), a copy of the Eclipse SDK, a copy of the Eclipse TPTP runtime, and several prerequisites and co-requisites on which the Eclipse TPTP depends. You also need a copy of the Eclipse TPTP GLA, which allows you to transform log files in a stand-alone application or in your own application. Here's everything you need:


Transforming and analyzing log files

To allow ongoing monitoring, a complex application -- and certainly an application expected to run continuously -- is typically instrumented during development to emit a log file, which is a record of application activity. Some activity can be detailed internal diagnostics, which is information crucial for isolating a bug or untangling interactions with other system and software components. Some activity logged might be initiated by the application itself -- say, to read a configuration file or to open a port for listening. Other activity might be generated by requests for service.

The problem: Ongoing monitoring for legacy applications

Depending on the application's purpose, a systems administrator might review the program's corresponding log file from time to time -- when an error occurs or even in real time to react to emergent events. Logs are often full of valuable historical information, too. Think of the traffic and usage patterns found only in Apache HTTP Server logs, for instance.

It would be ideal if all log files captured at least a minimum of information. It would be even better -- certainly from a systems administrator's point of view -- if the format of all log files was uniform. Consistency would make reading logs far easier, and homogeneity would certainly facilitate (not to mention cheapen the expense of) the development of automated tools that weed out vital events from the informational.

But invariability is not reality. Applications differ greatly (as do underlying operating system facilities and programming language libraries). Some applications are entrenched and cannot be revised ("legacy applications") to be brought to uniformity. And it's an ugly truth that expensive and scarce developer cycles are usually spent on new features, not retrofits.

The solution: Transform log file data

Short of the ideal and realizing that one solution cannot ever fit all, it is far more practical to transform log file data to meet evolving standards, de-facto or otherwise, and to apply state-of-the-art analysis tools. For example, the CBE format is part of an effort to define a broad standard for recording, tracking, and analyzing events, which are occurrences and situations that take place in computing systems. Many tools exist to process and analyze CBE data, which is based on XML.

But while transformation from arbitrary log file to CBE may be practical, the process may not be easy or inexpensive. Given the variety of applications and the sheer number of log file formats, writing so many transforms can be a Herculean task in itself.

The Eclipse TPTP GLA and Adapter Configuration Editor simplify the creation of transforms, thereby easing the migration to CBE. The GLA applies an adapter created by the Adapter Configuration Editor to a log file and yields CBE data. The Adapter Configuration Editor can run a handmade Java class if need be -- a static adapter -- or it can run a series of rules to divide the log file into records, fields, and values and reassemble them as CBE data. The latter form of adapter is a rules-based adapter and requires no coding. Better yet, the Adapter Configuration Editor runs in Eclipse and provides a rich adapter development environment in which you can incrementally define and test your adapter. Finally, you can choose to integrate the GLA with your own code or use third-party tools, such as the IBM Log and Trace Analyzer, to probe and investigate the resulting CBE event files.

This tutorial shows how to use the capabilities of the Eclipse TPTP GLA and Adapter Configuration Editor to convert a typical Linux application log file to CBE events. With a log file in hand and a little regular expression know-how, you can transform the log into a unified, structured CBE format.


Installing the prerequisite software and components

Before you can begin, you must install and set up the required software and components (see Prerequisites).

Install J2RE V1.4

Download and install Version 1.4 or 1.5 (also called Version 5.0). (If your system already has J2RE V1.4 or later, you can safely skip this step.)

Typically, the JRE is distributed as a self-extracting binary. Assuming that you downloaded the J2RE packages to your home directory, installation (on Linux) is typically as easy as Listing 1.

Listing 1. J2RE V1.4 installation
% cd ~
% mkdir ~/java
% cd ~/java
% mv ~/jre-1_5_0_06-linux-i586.bin .
% chmod +x jre-1_5_0_06-linux-i586.bin
% ./jre-1_5_0_06-linux-i586.bin
  ...
% rm ./jre-1_5_0_06-linux-i586.bin
% ls -F
jre1.5.0_06/

The commands in Listing 1 install J2RE V1.5, but the steps to install J2RE V1.4 are identical (except for the file name).

Install the Eclipse V3.1 SDK

Download the Eclipse V3.1 SDK that's appropriate for your platform. You can find the SDK at the Eclipse Downloads. Typically, installation is as easy as unpacking the Eclipse tarball (.tar.gz) file into the directory of your choice.

For example, if you're using Linux, download the Eclipse V3.1 SDK tarball and unpack it in a directory such as ~/java/ using the commands in Listing 2.

Listing 2. Eclipse V3.1 SDK installation
% cd ~/java
% mv ~/eclipse-SDK-3.1.1-linux-gtk.tar.gz .
% tar zxvf eclipse-SDK-3.1.1-linux-gtk.tar.gz
 ...
% rm eclipse-SDK-3.1.1-linux-gtk.tar.gz

To verify that you successfully installed Eclipse, remain in the directory where you unpacked Eclipse, make sure the java executable is in your PATH, and run java -jar eclipse/startup.jar. For example:

Listing 3. Verify installation
% export JAVA_DIR=$HOME/java
% export JAVA_HOME=$JAVA_DIR/jre1.5.0_06
% export PATH=$JAVA_HOME/bin
% export CLASSPATH=$JAVA_HOME
% cd $JAVA_DIR
% java -jar eclipse/startup.jar

If Eclipse prompts you to choose a directory for your workspace, use $HOME/java/workspace. This directory retains all the projects you create in Eclipse. (Of course, if you have many projects, you can create other workspaces later, perhaps to contain one project per workspace.) Now, quit Eclipse to install the Eclipse TPTP, its prerequisites and co-requisites, and the GLA.

Install the TPTP and GLA runtime

The Eclipse TPTP runtime contains the software required to create, debug, and run adapters. To install the Eclipse TPTP software, download the Eclipse TPTP and GLA runtimes. Both are typically distributed in zip format. Move both files into the directory that contains the J2RE and Eclipse and extract them (see Listing 4). If you're prompted to overwrite any files, simply choose All.

Listing 4. Eclipse TPTP and GLA installation
% cd ~/java
% mv ~/tptp.runtime-TPTP-4.1.0.zip .
% mv ~/tptp.gla.runtime-TPTP-4.1.0.1.zip .
% unzip tptp.runtime-TPTP-4.1.0.zip
  ...
% unzip tptp.gla.runtime-TPTP-4.1.0.1.zip
  ...
% rm tptp.runtime-TPTP-4.1.0.zip
% rm tptp.gla.runtime-TPTP-4.1.0.1.zip
% ls -F
GenericLogAdapter/  eclipse/  jre1.5.0_06/

Install the EMF SDK V2.1

You must install EMF SDK V2.1 for TPTP to work properly.

Quit Eclipse if it's running and download the EMF SDK V2.1. Then change to the directory that contains the Eclipse folder and run unzip emf-sdo-SDK-2.1.0.zip (see Listing 5).

Listing 5. EMF SDK V2.1 installation
% cd $JAVA_DIR
% ls
eclipse jre1.5.0_06
% mv ~/emf-sdo-SDK-2.1.0.zip . 
% unzip emf-sdo-SDK-2.1.0.zip
creating: eclipse/features/
creating: eclipse/features/org.eclipse.emf.ecore.sdo_2.1.0/
creating: eclipse/features/org.eclipse.emf_2.1.0/
inflating: ...
  ...
% rm emf-sdo-SDK-2.1.0.zip

Install the XSD SDK V2.1

As with the previous file, change to the directory that contains the Eclipse directory and run unzip xsd-SDK-2.1.0.zip (see Listing 6).

Listing 6. XSD SDK V2.1 installation
% cd $JAVA_DIR
% mv ~/xsd-SDK-2.1.0.zip .
% unzip xsd-SDK-2.1.0.zip
% rm xsd-SDK-2.1.0.zip

If prompted to confirm the overwrite of any files, simply press y (lowercase) to answer "yes" to each question.

Install the UML V2.0 Metamodel Implementation

To use the UML features of the Eclipse TPTP, you must install the UML V2.0 Metamodel Implementation. If using Eclipse V3.1.1, download Version 1.1.1 of UML 2 and unpack its archive file in the same directory that contains Eclipse (see Listing 7).

Listing 7. UML V2.0 Metamodel Implementation installation
% cd $JAVA_DIR
% mv ~/uml2-1.1.1.zip .
% unzip uml2-1.1.1.zip
  ...
% rm uml2-1.1.1.zip

Install the Agent Controller

The Agent Controller is a vital component of the Eclipse TPTP that allows Eclipse to launch applications and interact with those applications to extract profiling data. Download the Agent Controller runtime appropriate for your operating system. Next, create a directory named tptpd in the same directory that contains Eclipse and unpack the Agent Controller archive into that directory (see Listing 8).

Listing 8. Agent Controller installation
% mkdir $JAVA_DIR/tptpd
% cd $JAVA_DIR/tptpd
% mv ~/tptpdc.linux_ia32-TPTP-4.1.0.zip .
% unzip tptpdc.linux_ia32-TPTP-4.1.0.zip

If you see two errors like these:

Listing 9. Agent Controller installation
linking: lib/libxerces-c.so      
warning:  symbolic link (lib/libxerces-c.so) failed

linking: lib/libxerces-c.so.24   
warning:  symbolic link (lib/libxerces-c.so.24) failed

recreate the two links manually by typing the following:

Listing 10. Agent Controller installation
% cd $JAVA_DIR/tptpd/lib
% rm libxerces-c.so libxerces-c.so.24
% ln -s libxerces-c.so.24.0 libxerces-c.so
% ln -s libxerces-c.so.24.0 libxerces-c.so.24

Add the Agent Controller directory

To use the Agent Controller, you must add its lib directory to your LD_LIBRARY_PATH. For example, if you're running Linux and have adopted the same directory structure shown in the steps above, you'd add $JAVA_DIR/tptpd/lib as follows:

% export LD_LIBRARY_PATH=$JAVA_DIR/tptpd/lib:$LD_LIBRARY_PATH

You must also ensure that the contents of the Controller's lib and bin directories are executable. To do that, run:

% chmod +x $JAVA_DIR/tptpd/{bin,lib}/*

Now, add the scripts that configure, and start and stop the Agent Controller to your PATH:

% export PATH=$JAVA_DIR/tptpd/bin:$PATH

Configure the Agent Controller for your environment

Finally, you configure the Agent Controller to match your environment. Change to the Agent Controller's bin directory, then run SetConfig.sh.

% cd $JAVA_DIR/tptpd/bin
% ./SetConfig.sh

When the configure script prompts you, accept the defaults. Running the configure script creates the file config/serviceconfig.xml in the Agent Controller's hierarchy of files.

Test the Agent Controller

To test the Agent Controller, run RAStart.sh. To stop the Controller, run RAStop.sh:

Listing 11. Agent Controller installation
db% RAStart.sh 
Starting Agent Controller
RAServer started successfully
% RAStop.sh 
RAServer stopped, pid = 5891
RAServer stopped, pid = 5892
RAServer stopped, pid = 5893
RAServer stopped, pid = 5894
RAServer stopped, pid = 5895
RAServer stopped, pid = 5896
RAServer stopped, pid = 5897
RAServer stopped, pid = 5898
RAServer stopped, pid = 5899
RAServer stopped, pid = 5900
RAServer stopped, pid = 5901
RAServer stopped, pid = 5902
RAServer stopped, pid = 5904
RAServer stopped, pid = 5905
RAServer stopped, pid = 5906

Finished! Restart Eclipse and you should see a new button on the Eclipse toolbar that looks like Figure 1. That's the TPTP Profile button -- the indication that your installation of TPTP has been successful. You're ready to continue with the tutorial.

Figure 1. The TPTP Profile button
TPTP Profile button

Creating an adapter

The GLA uses an XML configuration file to control how it parses and transforms log files, and how it emits data. A configuration file contains one or more contexts in which each context defines how to transform one log file. In some cases, contexts within a configuration file can run simultaneously.

The adapter configuration file

Begin by creating an adapter configuration file to process the Linux log file named daemon.log. On your test system, running Debian Linux, daemon.log captures messages from the POP3 (e-mail), THTTPD (the "trivial" HTTP server -- a small, fast Web server that only serves static files), and MyDNS (a small, easy-to-configure Domain Name System (DNS) server) daemons. Daemon.log also records when the MySQL daemon starts and stops.

Listing 12 shows a snippet of the file with log entries created by the POP3 and THTTPD servers.

Listing 12. A snippet of the Linux daemon.log file
Mar  2 07:24:54 db popa3d[8861]: Session from 66.27.187.89
Mar  2 07:24:55 db popa3d[8861]: \
Authentication passed for joan
Mar  2 07:24:55 db popa3d[8861]: \
1422 messages (11773432 bytes) loaded
Mar  2 07:24:57 db popa3d[8861]: \
0 (0) deleted, 1422 (11773432) left
Mar  2 07:26:28 db thttpd[7784]: \
up 3600 seconds, stats for 3600 seconds:
Mar  2 07:26:28 db thttpd[7784]:   \
thttpd - 0 connections (0/sec), 0 max simultaneous
Mar  2 07:26:28 db thttpd[7784]:   \
map cache - 0 allocated, 0 active (0 bytes)... 
Mar  2 07:26:28 db thttpd[7784]:   \
fdwatch - 1589 selects (0.441389/sec)
Mar  2 07:26:28 db thttpd[7784]:   \
timers - 3 allocated, 3 active, 0 free
Mar  2 07:27:35 db popa3d[8911]: \
Session from 71.65.224.25
Mar  2 07:27:35 db popa3d[8911]: \
Authentication passed for martin
Mar  2 07:27:35 db popa3d[8911]: \
1350 messages (10880072 bytes) loaded
Mar  2 07:27:36 db popa3d[8911]: \
4 (11356) deleted, 1346 (10868716) left
Mar  2 07:29:54 db popa3d[8963]: \
Session from 66.27.187.89

Each context in an adapter configuration file defines six components: the context instance, the sensor, the extractor, the parser, the formatter, and the outputter. The context instance sets parameters for the general operation of the transformation, including whether the log is appended to continuously and how frequently the log is amended. The remaining five components (conceptually) act in sequence, reading input, performing a task, and passing results on for further processing (except for certain outputters, which simply write results to a file or to the console):

  • The sensor reads the log file in pieces until it reaches the end of the file and pauses. Then, when the sensor detects that the log file has grown, it reads the additional data. The sensor passes its data to the next stage, the extractor.
  • The extractor reads data and divides it into individual records. One regular expression defines what the start of a record looks like, and another regular expression defines the end of record. Individual records, when identified, are passed on to the parser for additional processing.
  • The parser reads one record at a time from the extractor and decomposes each into fields and values. Furthermore, the parser can make decisions based on the content of a record and apply one or more sets of rules to yield fields and values. For instance, if a log file indicates the start, interim, and end of an event, the parser can decompose each record into a set of fields and values unique to that event. Ultimately, the parser's objective is to map fields and values in each log file entry to the proper elements, attributes, and values in a CBE XML record. The formatter reads the output of the parser.
  • The formatter's job is simple: It reads the elements, attributes, and values the parser creates, and it creates an object suitable for consumption by the last stage in the context, the outputter.
  • And the outputter consumes objects from the formatter and emits the object. Outputters can emit XML to a file or to the console. They can also create a new log file or pass the data to a daemon.

The next five sections describe how to define each of the six components of a context.

Create an adapter configuration file

To begin, create a simple Eclipse project to contain the adapter configuration file:

  1. Click File > New and expand Simple. Choose Project and click Next.
  2. Name the project My Adapter and click Finish.
  3. Click File > New > Other and expand Generic Log Adapter. Choose Generic Log Adapter File and click Next.
  4. Choose My Adapter and name the adapter file my.adapter. Click Next.
  5. Choose a template for the log file you want to process with this adapter (see Figure 2).

    You can use a snippet of the actual log file you want to process or an accurate representation of the log file -- say, from a detailed specification. Click Browse, navigate to the file system, and open the template. After making your selection, click Finish. Click Yes when prompted to switch perspectives.
Figure 2. Choose a template that represents the log file to transform
Choose a template that represents the log file to transform

Figure 3 shows the Generic Log Adapter perspective. As you can see, the UI displays a context instance in which the sensor's properties point to the template log file you just chose. The context instance also includes an extractor, a parser, a formatter, and an outputter, which you must define further.

Figure 3. The Generic Log Adapter perspective
The Generic Log Adapter perspective

Configure the context

Each context instance describes how to process one log file. You can set several options in a context instance. To see the options, click Context Instance below Configuration. You should see a panel that resembles Figure 4.

Figure 4. Context instance options
Context instance options

You can edit the Description to capture the intent of this particular context. In addition:

  • If your log file is continuously updated, as is the case with daemon.log, select the Continuous operation check box.
  • Maximum idle time is the number of milliseconds a context should wait for the log file to change before the context instance is shut down.
  • Pause interval controls how long the context should wait after it reaches the end of a log file.
  • Because log files aren't only ASCII text, you can set the ISO language code (using two lowercase letters), the ISO country code (using two uppercase letters), and the file's Encoding (using a value from the Internet Assigned Numbers Authority (IANA) character set registry). By default, these parameters are set to en, US, and the default encoding of the JVM.
  • Finally, because some log files do not denote time zone, year, month, and day and because CBEs require all four values, you can provide substitute values in the Timezone GMT offset and Log file creation date fields.

Because daemon.log grows continuously, select the Continuous operation check box. Because mail is typically polled often, set Maximum idle time and Pause interval to 120. The test machine is located in Colorado, so the GMT is -7. Daemon.log doesn't specify the year, so a default of 2006 is provided as a substitute. After making these changes, save the file.


Specifying the sensor

A sensor reads a log file and forwards the data collected to the extractor. The next step is to specify how your sensor should work.

Specify how the sensor works

Click the sensor. Its properties are shown in Figure 5, which also shows the values set for the daemon.log sensor.

Figure 5. Setting the sensor for daemon.log
Setting the sensor for daemon.log

Because daemon.log is a single file, you don't need to change the Sensor type option. The Description field provides for clarity of purpose. Maximum blocking defines the number of lines to read before passing the input along to the extractor. Because entries in daemon.log tend to span many lines, 10 is a reasonable setting. The value for Confidence buffer size dictates the size of a buffer to contain the last n bytes of the log file. If the log file changes -- that is, the last n bytes differ from what's retained in the Confidence buffer -- the sensor reads more input. The default is 1,024 bytes, which is sufficient for this example.

Some logs append a footer to the end of the log file (each time new data is written). Usually, this data is best ignored, so to skip the footer, specify the number of bytes to skip in File footer size. Daemon.log doesn't have a footer, so the value is set to 0.

If you expand the Sensor type (by clicking on the arrow), you'll see two additional properties: directory and fileName. These properties are initially set to the location and name of your template log file, but you'll soon switch them to process live data.

Don't forget to save the configuration file after setting the sensor properties. And, in general, always save the configuration file before you attempt to run the adapter.


Editing the extractor

The role of the sensor is to collect input. The role of the extractor is to divide the incoming input stream into individual records. (The next component in the chain -- the parser -- divides each record into fields.)

Configure the extractor properties

To edit the extractor, click Extractor. Its properties are shown in Figure 6. The properties of the extractor specify the delimiters of each record and control whether those delimiters should be included in the record passed on to the parser.

Figure 6. The extractor properties
The extractor properties

In the example log file, daemon.log, each line of the log is a separate event. This makes the extractor particularly easy to configure. (Figure 6 is the appropriate configuration for daemon.log.)

  • The Contains line breaks check box is cleared, because each line in daemon.log is a record. However, if an entry were to span many lines, as is the case with MySQL or IBM DB2® database logs, you'd select this check box.
  • The Replace line breaks check box is also cleared in this example. If the log file contained line breaks, though, you could select this check box to either delete each line break or replace each one with a special marker -- useful for parsing. To delete line breaks, simply select the check box; to replace each line break with a token, select the check box and provide the delimiter in the Line break symbol field. It's best to choose a symbol that doesn't appear in the log file.
  • The Start pattern and End pattern are regular expressions that describe the start and end of each record. Here, where each line is a record, the beginning of the line, or ^ (caret), marks the start of the record. The end of the line, or $ (dollar sign), marks the end of each record. Because ^ and $ do not capture any content, neither need be included in the record itself.

Save your work before continuing.

A MySQL example

For comparison, create another example extractor for MySQL's slow query log, a special log used to capture suboptimal queries. Each entry in the slow query log spans at least three lines (see Listing 13).

Listing 13. A snippet of MySQL's slow query log
# Time: 030207 15:03:33
# Query_time: 13  Lock_time: 0  Rows_sent: 0  Rows_examined: 0
SELECT l FROM un WHERE ip='209.xx.xxx.xx';
# Time: 030207 15:03:42
# Query_time: 17  Lock_time: 1  Rows_sent: 0  Rows_examined: 0
SELECT l FROM un WHERE ip='214.xx.xxx.xx';
# Time: 030207 15:03:43
# Query_time: 57  Lock_time: 0  Rows_sent: 2117  Rows_examined: 4234
SELECT c,cn,ct FROM cr,l,un WHERE ci=lt AND lf='MP' AND ui=cu;

An extractor for the slow query log might look something like Figure 7.

Figure 7. A sample extractor for the MySQL slow query log
A sample extractor for the MySQL slow query log

Figure 8 shows the second of the three records, each successfully processed by the extractor.

Figure 8. An extracted record from the slow query log
An extracted record from the slow query log

Testing your work so far

Returning to the daemon.log adapter, you can now test the sensor and extractor components to verify that data is being acquired and divided into records.

Rerun the adapter

Glance at the two panes at the bottom of the Generic Log Adapter perspective. You should see something resembling Figure 9. At left is the Extractor Result pane; at right, layered, are the Formatter Result pane, the Sensor Result pane, and the Problems pane. A series of buttons that control the adapter appear within the Extractor Result pane. Figure 10 labels the buttons (or you can slowly mouse over each button to see a tool tip.)

Figure 9. Context components display panes
Context components display panes
Figure 10. The adapter control buttons
The adapter control buttons

Click Rerun adapter to restart processing from the beginning of the log file template. Then click Next event to process the first event.

  • The Sensor Result pane should show the first 10-20 lines of the log file.
  • The Extractor Result pane should show the first line of the log file, Mar 2 06:27:35 db popa3d[7964]: Session from 71.65.224.25.
  • The Problems pane should be empty. However, pay close attention to this pane whenever you run your adapter. If you've omitted required CBE properties, specified an illegal regular expression, or used an unsupported value, this pane should point those out.
  • The Formatter Result pane is irrelevant because a parser has yet to be defined. However, it does show an initial XML CBE for the current record:
    Listing 14. Initial XML CBE for current record
    <CommonBaseEvent 
        creationTime="replace with our message text" 
        globalInstanceId="A1DAABE6C7876D20E8E9E8C475042F1B" 
        version="1.0.1">
    </CommonBaseEvent>

As you'll see, as you define your parser, additional elements and attributes will automatically be added to the XML.

To have the extractor produce the next record, click Next event again. To fast-forward to the last record (in the input the sensor has collected so far), click Show last event.


Producing the parser

The sensor reads data. The extractor subdivides the data into records. The role of the parser is to extract specific fields from each record and use those values to construct a complete CBE XML record.

The role of the parser

The parser may extract some fields from the log file directly, such as a time stamp, host name, daemon name, and a text message. The parser may also infer data from a record. For example, the parser may detect that the record originated with a software service and set the CBE componentIdType attribute to ServiceName. In other instances, the parser may add data to a record. In particular, if a log entry doesn't record the day, month, year, time, and time zone of the event, the parser must add that data to create a valid CBE.

To put the parser for the daemon.log example in perspective, Listing 15 shows a valid CBE XML record for the log entry Mar 2 06:27:35 db popa3d[7964]: Session from 71.65.224.25. Some of the attributes are plainly derived from the original log entry; others will be manufactured from implied data. (Many of the values of the attributes come from the Common Base Events Specification. It's helpful to use that document while creating your parsers.)

Listing 15. The CBE equivalent of the first record of daemon.log
<CommonBaseEvent 
    creationTime="2006-03-02T13:27:35.000Z" 
    globalInstanceId="A1DAABECA2ACB4F0E8E9E8C475042F1B" 
    msg="Session from 71.65.224.25" 
    version="1.0.1">
  <sourceComponentId 
      component="popa3d" 
      componentIdType="ServiceName" 
      location="db.linux-mag.com" 
      locationType="Hostname" 
      subComponent="7964" 
      componentType="daemon"
   />
  <situation 
      categoryName="StartSituation">
      <situationType 
        xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" 
        xsi:type="StartSituation" 
        reasoningScope="EXTERNAL" 
        successDisposition="SUCCESSFUL" 
        situationQualifier="START INITIATED"/>
  </situation>
</CommonBaseEvent>

Also keep in mind that (at a minimum) every CBE must define the creationTime attribute, the msg attribute, and the sourceComponentId element, which in turn must have the six attributes shown in Listing 15. The situation element (among others) is optional, but is part of the example to elaborate upon the event.

How the parser works

Click Parser in the Generic Log Adapter perspective to begin the process of defining it. Figure 11 shows what the completed parser looks like. There is one parser task for every attribute and element in the CBE shown in Listing 15.

Figure 11. The complete parser for daemon.log
The complete parser for daemon.log

The parser works in two phases. First, it divides the incoming record (from the extractor) into positions, or numbered parts, in which each part is separated from the other by the separator token. If no separator token is specified, this step is skipped. Then the parser divides the record into designations, or (name, value) pairs, in which each (name, value) pair is two strings joined by the designation token. If no designation token is specified, the latter step is skipped.

Consider this example: If the separator token is the regular expression [ ]+, the designation token is = (equal sign), and the parser is handed the record:

03/05/06 12:51:06EST Mail name=joe action=login authentication=password

the parser would define six positions and three designations, as shown in Table 1.

Table 1. Positions and designations from the parser
Position/DesignationValue
103/05/06
212:51:06EST
3Mail
4name=joe
5action=login
6authentication=password
h{'name'}joe
h{'action'}login
h{'authentication'}password

Note: If your incoming record begins with the separator token, position 1 is created, but left empty.

You can use all the defined positions and designations to simplify each parser task. For instance, to create the creationTime attribute, you need only parse position 2. Of course, the entire original record is always available. However, positions and designations make each parsing task faster and easier to manage because the source string is smaller. In many cases, you can use a position or designation directly for a CBE value.

Parse the sample log entries

Click Parser again. For convenience, break each daemon.log entry into two positions using the separator token :[ ]+ (a colon followed by one or more spaces). The daemon.log log entries don't have (name, value) pairs, so the designator token is omitted. These settings are shown in Figure 12. Now, save your work.

Figure 12. Dividing a record into positions
Dividing a record into positions

Set the creationTime

Set the first required field in the CBE: creationTime. The goal is to transform the time stamp provided with the daemon.log record into a time format compatible with the XML schema dateTime data type. As a convenience, the adapter can automatically permute a time format understood by class java.text.SimpleDateFormat into the XML schema data type.

To set the creationTime field, complete these steps:

  1. Expand the parser and select creationTime. This is a required CBE attribute, so select the Required by parent check box.
  2. Click the substitution rule associated with creationTime.
  3. For Positions, type 1 because position 1 contains the time stamp to extract.
  4. For Match, provide the regular expression ^(\w{3})\s+(\d{1,2})\s+([\d:]+)\s+.*$. This expression captures the month name as $1, the day of the month as $2, and the time of day as $3.
  5. For Substitute, supply $1 $2 @YEAR $3 @TIMEZONE.
    Substitute is used instead of the entire incoming record in the rest of this specific parsing task. $1, $2, and $3 came from the previous step. However, because the time stamp doesn't include a year or a time zone, the year and time zone associated with the current context instance, represented by the shorthand @YEAR and @TIMEZONE, respectively, are used instead. Therefore, for the first daemon.log record, the settings in Substitute yield the string Mar 02 2006 06:27:35 -0700.
  6. Ignoring the Substitute extension class field, which allows you to provide a Java class to do additional substitutions, transform the result of the substitution to the right type. You can use a java.text.SimpleDateFormat format string to do the heavy lifting. Set Time format to MMM dd yyyy hh:mm:ss Z, indicating a three-letter name of the month; a two-digit day of the month; a four-digit year; hours, minutes, and seconds separated by colons; and an RFC 822 time zone.

Figure 13 shows the final settings for creationTime. If you save the configuration file and rerun the adapter, the Formatter Result pane should show a new XML record with attribute creationTime="2006-03-02T13:27:35.000Z".

Figure 13. Parsing the incoming time stamp into the creationTime attribute
Parsing the incoming time stamp into the creationTime attribute

Getting the message

The msg attribute is another required CBE attribute. Add this attribute and create the parser task to extract a suitable value:

  1. Right-click CommonBaseEvent, then click Add > msg.
  2. Click msg, then select the Required by parent check box.
  3. Expand msg, then click Substitution Rule.
  4. Specify 2 in the Positions field because the message portion of the log entry is located in position 2. (It's everything after the separator token.)
  5. For Match, specify a regular expression that selects the entire string. The regular expression ^(.*)$ captures everything in $1.
  6. For Substitute, specify $1.

Figure 14 shows the final settings.

Figure 14. Settings to extract the message
Settings to extract the message

Save the configuration file and click Rerun adapter, found in the Extractor Result pane. Click Next event and switch to the Formatter Result pane. You should see a new msg attribute that looks like msg="Session from 71.65.224.25".

Find the source

The last mandatory part of a CBE record is the sourceComponentId, used to record the component (service, system, and so on) that's affected by the event. In the instance of daemon.log, the components affected are software services running on a specific host. The parser's job is to capture and record the specifics.

Right-click CommonBaseEvent once again, and then click Add > sourceComponentId. (Figure 15 shows all the possible attributes and elements you can add to a CBE.) For brevity, Table 2 shows all the settings required for sourceComponentId. One new setting is Default value. If a match is made by a parsing rule, but no substitute value is provided, the Default value is used.

Figure 15. List of elements and attributes you can add to a CBE record
List of elements and attributes you can add to a CBE record
Table 2. Settings for the sourceComponentId
ItemDefault valueRequired by parentPositionsMatchSubstituteNotes
componentYes1^.* db (\w+)\[.*$$1Captures the name of the software service, such as pop3ad or mysqld.
componentIdTypeServiceNameYes^(.*)Indicates that the component records the name of a service; ServiceName is one of the prescribed values for this attribute, according to the CBE specification.
componentTypedaemonYes^(.*)Describes the class of the component.
locationdb.linux-mag.comYes^(.*)Specifies the physical address that corresponds to the location of a component. The format of the value of the location is specified by the locationType property. It is recommended that you use a fully qualified host name for this attribute. Here, because the log entry does not include a host name, one is added via the default value. In other cases, you may be able to parse the host name directly from the log.
locationTypeHostnameYes1^(.*)Specifies the format and meaning of the value in the location property. The Hostname keyword is one of many possible keywords that you can use here.
subComponentYes^.*\[(\d+)\].*$1Identifies the specific daemon process that the event affects.

If you make all the changes listed in Table 2, and save and rerun the adapter, you should yield CBE event records that resemble Listing 15. As an additional exercise, add a situation to the CBE. Situations categorize the type of situation that initiated the event. For instance, you might create a parser to create a StartSituation whenever the daemon is initially contacted for service or create another parser to create a RequestSituation when a request is made.

Situations aren't required (hence, Required by parent can be disabled), but you may find them useful to add granularity to your CBE records. If you create a situation and add a series of possible situation parsers, select the Child choice check box if processing can stop after the first match is made.

Here's a helpful tip for debugging your parsers: If a property is required, but not found in the incoming record (passed to the parser from the extractor), the Formatter Result pane for that record will be empty. In other words, required properties behave like logical AND: If one match fails, processing for that record stops. It's often useful to clear the Required by parent check box to debug rules. Build your rules slowly and incrementally, and watch the Problem pane for clues.


The formatter and organizing the outputter

Now that the parser has yielded properties and values, the new data must be assembled into a CBE instance. That's the role of the formatter.

Emit CBE XML records to a file

The adapter formatter requires no configuration. It's an internal operation that creates CBE objects that conform to the CBE V1.0.1 specification.

After the formatter has created CBE objects, it's the job of the outputter to emit them to a file, standard output, another log, a logging agent, or a log analyzer. If your adapter configuration defines multiple contexts, you can use a special formatter to allow multiple contexts to write to a single file.

To keep things simple, emit the CBE XML records to a single file:

  1. Click Outputter in the Generic Log Adapter perspective, then choose SingleFileOutputter for Outputter type.
  2. Right-click Outputter, then click Add > property.
  3. Click the new property, then set Property name to directory. Set the Property value to a directory to which you're able to write files. Omit the name of a file. Just specify the path of the directory, omitting the trailing slash.
  4. Right-click Outputter again and click Add > property. Set this new Property name to fileName, and set the Property value to a file name. This file will be created in the directory named by directory.

Change the context instance

In addition to changing the configuration, you must also change the context instance to use the proper outputter class. To do so, complete these steps:

  1. Expand Contexts in the General Log Adapter perspective and expand Context Basic Context Implementation.
  2. Click Component Logging Agent Outputter.
  3. Change the Name and Description to Single File Outputter.
  4. Change the Executable class to org.eclipse.hyades.logging.adapter.outputters.CBEFileOutputter.
  5. Save the configuration file.

Add the SingleFileOutputterType

There is one more important step: For some reason, the Adapter Configuration Editor can omit an important element from the outputter definition in the configuration file for the adapter. (You can read the relevant thread on the developerWorks Autonomic computing forum's No Output from Outputter.) However, you can quickly add the element to the file manually.

Using your favorite editor, open the file my.adapter. Scroll to the bottom of the file and look for the following text.

Listing 16. The CBE equivalent of the first record of daemon.log
<cc:Outputter 
  description="Single File Outputter" 
  uniqueID="N13725210AFF11DA8000AE8373D52828" 
  type="SingleFileOutputter">
    <pu:Property propertyName="directory" 
      propertyValue="/home/mstreicher"/>
    <pu:Property propertyName="fileName" 
      propertyValue="emitter.log"/>
    <op:SingleFileOutputterType directory="/home/mstreicher" 
      fileName="emitter.log"/>
</cc:Outputter>

If the line <op:SingleFileOutputterType... /> is missing, add it, changing the values of attributes directory and fileName to match the values of the similarly names properties. Then save the file.


Running the GLA

Your rules-based adapter is now complete. Step through your template log file using the controls in the Extractor Result pane and validate its operation. When you're satisfied that everything is working properly, you can move on to running your adapter using the stand-alone GLA.

Run the adapter using GLA

The GLA uses the settings you created in your adapter to read a log file and produce a CBE XML document. Listing 11 shows a small portion of the file my.adapter.

Listing 17. A snippet of the file my.adapater
<adapter:Adapter...
  <cc:ContextInstance 
    charset="" 
    continuousOperation="true" 
    description="A context for daemon.log" 
    isoCountryCode="" isoLanguageCode="" 
    maximumIdleTime="120" 
    pauseInterval="120" 
    timezone="-0700" 
    uniqueID="N05306B00AFF11DA8000AE8373D52828"
    year="2006">
  <cc:Sensor 
    description="Read the daemon.log" 
    uniqueID="N057E8B10AFF11DA8000AE8373D52828" 
    confidenceBufferSize="1024" 
    fileFooterSize="0" 
    maximumBlocking="10" 
    type="SingleFileSensor">
      <pu:Property 
        propertyName="directory" 
        propertyValue="/home/mstreicher/java-tptp-gla"
      />
      <pu:Property 
        propertyName="fileName" 
        propertyValue="daemon.log"
      />
      <sensor:SingleFileSensor 
        directory="/home/mstreicher/java-tptp-gla" 
        fileName="daemon.log"'
      />
  </cc:Sensor>
  <ex:Extractor 
    containsLineBreaks="false" 
    description="Divide daemon.log into individual records" 
    endPattern="$" 
    includeEndPattern="false" 
    includeStartPattern="false" 
    lineBreakSymbol="" 
    replaceLineBreaks="false" 
    startPattern="^" 
    uniqueID="N05AA7D00AFF11DA8000AE8373D52828"
  />
.
</adapter:Adapter>

To run the GLA, you must first edit its script to point to where you installed it. Using your favorite editor, open the file gla.sh in GenericLogAdapter/bin. (If you followed the installation instructions verbatim, the file resides in ~/java/GenericLogAdapter/bin/gla.sh.) Find the line GLA_HOME=/home/eclipse/GenericLogAdapter and change the path to point to the directory that contains your copy of the GLA. Again, if you followed the instructions verbatim, you would change the line to read GLA_HOME=~/java/GenericLogAdapter. Save the file.

Next, find the file my.adapter in your Eclipse workspace under the directory My Adapter. On the test system, my.adapter was found in ~/workspace/My Adapter/my.adapter. To run the adapter, execute gla.sh, providing the path to your adapter file as the only argument:

% ~/java/GenericLogAdapter/bin/gla.sh ~/workspace/My\ Adapter/my.adapter

After a moment, the file emitter.log should appear in your home directory (or wherever you configured your file outputter to create the file).


Summary

This tutorial demonstrated how to create a rules-based adapter to convert a typical Linux log file into a CBE log file. Given a CBE log, you can use a tool such as the Autonomic Computing Toolkit's Log and Trace Analyzer to further process the CBE data.

Furthermore, if the rules constructs provided by Adapter Configuration Editor aren't suitable for your log file, you can integrate your own Java class to parse and emit CBE format. Unlike the rules-based adapter, a static parser (so-named because it uses a Java class instead of rules) only needs a sensor and outputter, both of which your Java code provides. You still run the GLA on the final configuration file, but you must include your Java class in the GLA CLASSPATH, too.

In any case, the Adapter Configuration Editor and the GLA provide a powerful environment in which to analyze the behavior of existing, even legacy applications using modern autonomic computing tools. Simple conversion from any number of log file formats to CBE requires just a few minutes of work; complex, detailed conversions are easily accomplished with rich rules or your own code.

Resources

Learn

Get products and technologies

  • Download the entire Eclipse TPTP runtime. Discover all the features of the Eclipse TPTP, as well as extensive documentation, tutorials, presentations, and screencasts that illuminate the capabilities of the Eclipse TPTP.
  • Download the Eclipse GLA.
  • Read more about the Autonomic Computing Toolkit and download its Log and Trace Analyzer.
  • Download Java technology from Sun Microsystems or from IBM.
  • Download the freely available, extensible open source Eclipse SDK.

Discuss

  • Connect with Eclipse developers and other users in the Eclipse mailing lists and newsgroups. (You must register to read the newsgroups, but membership is free, and the registration process is easy.)
  • Get involved in the developerWorks community by participating in developerWorks blogs.

Comments

developerWorks: Sign in

Required fields are indicated with an asterisk (*).


Need an IBM ID?
Forgot your IBM ID?


Forgot your password?
Change your password

By clicking Submit, you agree to the developerWorks terms of use.

 


The first time you sign into developerWorks, a profile is created for you. Information in your profile (your name, country/region, and company name) is displayed to the public and will accompany any content you post, unless you opt to hide your company name. You may update your IBM account at any time.

All information submitted is secure.

Choose your display name



The first time you sign in to developerWorks, a profile is created for you, so you need to choose a display name. Your display name accompanies the content you post on developerWorks.

Please choose a display name between 3-31 characters. Your display name must be unique in the developerWorks community and should not be your email address for privacy reasons.

Required fields are indicated with an asterisk (*).

(Must be between 3 – 31 characters.)

By clicking Submit, you agree to the developerWorks terms of use.

 


All information submitted is secure.

Dig deeper into Open source on developerWorks


static.content.url=http://www.ibm.com/developerworks/js/artrating/
SITE_ID=1
Zone=Open source, Java technology
ArticleID=108958
ArticleTitle=Eclipse Test & Performance Tools Platform, Part 2: Monitor applications
publish-date=04252006