Skip to main content

By clicking Submit, you agree to the developerWorks terms of use.

The first time you sign into developerWorks, a profile is created for you. Select information in your profile (name, country/region, and company) is displayed to the public and will accompany any content you post. You may update your IBM account at any time.

All information submitted is secure.

  • Close [x]

The first time you sign in to developerWorks, a profile is created for you, so you need to choose a display name. Your display name accompanies the content you post on developerworks.

Please choose a display name between 3-31 characters. Your display name must be unique in the developerWorks community and should not be your email address for privacy reasons.

By clicking Submit, you agree to the developerWorks terms of use.

All information submitted is secure.

  • Close [x]

Eclipse Test & Performance Tools Platform, Part 2: Monitor applications

Collect and analyze a variety of log files

Martin Streicher (martin.streicher@linux-mag.com), Editor in Chief, Linux Magazine
Martin Streicher is the Editor-in-Chief of Linux Magazine. Martin earned a Master of Science in Computer Science from Purdue University and has been programming UNIX-like systems since 1986 in the Pascal, C, Perl, Java, and (most recently) Ruby programming languages.

Summary:  In this "Eclipse Test & Performance Tools Platform" tutorial series, learn how to use the capabilities of the Eclipse Test & Performance Tools Platform (TPTP) to convert application log files into a structured format. Then, using TPTP and other specialized tools designed to process and analyze log files, you can quickly discern usage patterns, performance profiles, and errors.

View more content in this series

Date:  25 Apr 2006
Level:  Intermediate PDF:  A4 and Letter (423 KB | 39 pages)Get Adobe® Reader®

Activity:  20137 views
Comments:  

Producing the parser

The sensor reads data. The extractor subdivides the data into records. The role of the parser is to extract specific fields from each record and use those values to construct a complete CBE XML record.

The role of the parser

The parser may extract some fields from the log file directly, such as a time stamp, host name, daemon name, and a text message. The parser may also infer data from a record. For example, the parser may detect that the record originated with a software service and set the CBE componentIdType attribute to ServiceName. In other instances, the parser may add data to a record. In particular, if a log entry doesn't record the day, month, year, time, and time zone of the event, the parser must add that data to create a valid CBE.

To put the parser for the daemon.log example in perspective, Listing 15 shows a valid CBE XML record for the log entry Mar 2 06:27:35 db popa3d[7964]: Session from 71.65.224.25. Some of the attributes are plainly derived from the original log entry; others will be manufactured from implied data. (Many of the values of the attributes come from the Common Base Events Specification. It's helpful to use that document while creating your parsers.)


Listing 15. The CBE equivalent of the first record of daemon.log
                    
<CommonBaseEvent 
    creationTime="2006-03-02T13:27:35.000Z" 
    globalInstanceId="A1DAABECA2ACB4F0E8E9E8C475042F1B" 
    msg="Session from 71.65.224.25" 
    version="1.0.1">
  <sourceComponentId 
      component="popa3d" 
      componentIdType="ServiceName" 
      location="db.linux-mag.com" 
      locationType="Hostname" 
      subComponent="7964" 
      componentType="daemon"
   />
  <situation 
      categoryName="StartSituation">
      <situationType 
        xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" 
        xsi:type="StartSituation" 
        reasoningScope="EXTERNAL" 
        successDisposition="SUCCESSFUL" 
        situationQualifier="START INITIATED"/>
  </situation>
</CommonBaseEvent>

Also keep in mind that (at a minimum) every CBE must define the creationTime attribute, the msg attribute, and the sourceComponentId element, which in turn must have the six attributes shown in Listing 15. The situation element (among others) is optional, but is part of the example to elaborate upon the event.


How the parser works

Click Parser in the Generic Log Adapter perspective to begin the process of defining it. Figure 11 shows what the completed parser looks like. There is one parser task for every attribute and element in the CBE shown in Listing 15.


Figure 11. The complete parser for daemon.log
The complete parser for daemon.log

The parser works in two phases. First, it divides the incoming record (from the extractor) into positions, or numbered parts, in which each part is separated from the other by the separator token. If no separator token is specified, this step is skipped. Then the parser divides the record into designations, or (name, value) pairs, in which each (name, value) pair is two strings joined by the designation token. If no designation token is specified, the latter step is skipped.

Consider this example: If the separator token is the regular expression [ ]+, the designation token is = (equal sign), and the parser is handed the record:

03/05/06 12:51:06EST Mail name=joe action=login authentication=password

the parser would define six positions and three designations, as shown in Table 1.


Table 1. Positions and designations from the parser
Position/DesignationValue
1 03/05/06
2 12:51:06EST
3 Mail
4 name=joe
5 action=login
6 authentication=password
h{'name'} joe
h{'action'} login
h{'authentication'} password

Note: If your incoming record begins with the separator token, position 1 is created, but left empty.

You can use all the defined positions and designations to simplify each parser task. For instance, to create the creationTime attribute, you need only parse position 2. Of course, the entire original record is always available. However, positions and designations make each parsing task faster and easier to manage because the source string is smaller. In many cases, you can use a position or designation directly for a CBE value.


Parse the sample log entries

Click Parser again. For convenience, break each daemon.log entry into two positions using the separator token :[ ]+ (a colon followed by one or more spaces). The daemon.log log entries don't have (name, value) pairs, so the designator token is omitted. These settings are shown in Figure 12. Now, save your work.


Figure 12. Dividing a record into positions
Dividing a record into positions

Set the creationTime

Set the first required field in the CBE: creationTime. The goal is to transform the time stamp provided with the daemon.log record into a time format compatible with the XML schema dateTime data type. As a convenience, the adapter can automatically permute a time format understood by class java.text.SimpleDateFormat into the XML schema data type.

To set the creationTime field, complete these steps:

  1. Expand the parser and select creationTime. This is a required CBE attribute, so select the Required by parent check box.
  2. Click the substitution rule associated with creationTime.
  3. For Positions, type 1 because position 1 contains the time stamp to extract.
  4. For Match, provide the regular expression ^(\w{3})\s+(\d{1,2})\s+([\d:]+)\s+.*$. This expression captures the month name as $1, the day of the month as $2, and the time of day as $3.
  5. For Substitute, supply $1 $2 @YEAR $3 @TIMEZONE.
    Substitute is used instead of the entire incoming record in the rest of this specific parsing task. $1, $2, and $3 came from the previous step. However, because the time stamp doesn't include a year or a time zone, the year and time zone associated with the current context instance, represented by the shorthand @YEAR and @TIMEZONE, respectively, are used instead. Therefore, for the first daemon.log record, the settings in Substitute yield the string Mar 02 2006 06:27:35 -0700.
  6. Ignoring the Substitute extension class field, which allows you to provide a Java class to do additional substitutions, transform the result of the substitution to the right type. You can use a java.text.SimpleDateFormat format string to do the heavy lifting. Set Time format to MMM dd yyyy hh:mm:ss Z, indicating a three-letter name of the month; a two-digit day of the month; a four-digit year; hours, minutes, and seconds separated by colons; and an RFC 822 time zone.

Figure 13 shows the final settings for creationTime. If you save the configuration file and rerun the adapter, the Formatter Result pane should show a new XML record with attribute creationTime="2006-03-02T13:27:35.000Z".


Figure 13. Parsing the incoming time stamp into the creationTime attribute
Parsing the incoming time stamp into the creationTime attribute

Getting the message

The msg attribute is another required CBE attribute. Add this attribute and create the parser task to extract a suitable value:

  1. Right-click CommonBaseEvent, then click Add > msg.
  2. Click msg, then select the Required by parent check box.
  3. Expand msg, then click Substitution Rule.
  4. Specify 2 in the Positions field because the message portion of the log entry is located in position 2. (It's everything after the separator token.)
  5. For Match, specify a regular expression that selects the entire string. The regular expression ^(.*)$ captures everything in $1.
  6. For Substitute, specify $1.

Figure 14 shows the final settings.


Figure 14. Settings to extract the message
Settings to extract the message

Save the configuration file and click Rerun adapter, found in the Extractor Result pane. Click Next event and switch to the Formatter Result pane. You should see a new msg attribute that looks like msg="Session from 71.65.224.25".

Find the source

The last mandatory part of a CBE record is the sourceComponentId, used to record the component (service, system, and so on) that's affected by the event. In the instance of daemon.log, the components affected are software services running on a specific host. The parser's job is to capture and record the specifics.

Right-click CommonBaseEvent once again, and then click Add > sourceComponentId. (Figure 15 shows all the possible attributes and elements you can add to a CBE.) For brevity, Table 2 shows all the settings required for sourceComponentId. One new setting is Default value. If a match is made by a parsing rule, but no substitute value is provided, the Default value is used.


Figure 15. List of elements and attributes you can add to a CBE record
List of elements and attributes you can add to a CBE record


Table 2. Settings for the sourceComponentId
ItemDefault valueRequired by parentPositionsMatchSubstituteNotes
componentYes1 ^.* db (\w+)\[.*$ $1 Captures the name of the software service, such as pop3ad or mysqld.
componentIdType ServiceName Yes ^(.*) Indicates that the component records the name of a service; ServiceName is one of the prescribed values for this attribute, according to the CBE specification.
componentType daemon Yes ^(.*) Describes the class of the component.
location db.linux-mag.com Yes ^(.*) Specifies the physical address that corresponds to the location of a component. The format of the value of the location is specified by the locationType property. It is recommended that you use a fully qualified host name for this attribute. Here, because the log entry does not include a host name, one is added via the default value. In other cases, you may be able to parse the host name directly from the log.
locationType Hostname Yes1 ^(.*) Specifies the format and meaning of the value in the location property. The Hostname keyword is one of many possible keywords that you can use here.
subComponentYes ^.*\[(\d+)\].* $1 Identifies the specific daemon process that the event affects.

If you make all the changes listed in Table 2, and save and rerun the adapter, you should yield CBE event records that resemble Listing 15. As an additional exercise, add a situation to the CBE. Situations categorize the type of situation that initiated the event. For instance, you might create a parser to create a StartSituation whenever the daemon is initially contacted for service or create another parser to create a RequestSituation when a request is made.

Situations aren't required (hence, Required by parent can be disabled), but you may find them useful to add granularity to your CBE records. If you create a situation and add a series of possible situation parsers, select the Child choice check box if processing can stop after the first match is made.

Here's a helpful tip for debugging your parsers: If a property is required, but not found in the incoming record (passed to the parser from the extractor), the Formatter Result pane for that record will be empty. In other words, required properties behave like logical AND: If one match fails, processing for that record stops. It's often useful to clear the Required by parent check box to debug rules. Build your rules slowly and incrementally, and watch the Problem pane for clues.

7 of 12 | Previous | Next

Comments



static.content.url=http://www.ibm.com/developerworks/js/artrating/
SITE_ID=1
Zone=Open source, Java technology
ArticleID=108958
TutorialTitle=Eclipse Test & Performance Tools Platform, Part 2: Monitor applications
publish-date=04252006
author1-email=martin.streicher@linux-mag.com
author1-email-cc=martin.streicher@linux-mag.com