Producing the parser
The sensor reads data. The extractor subdivides the data into records. The role of the parser is to extract specific fields from each record and use those values to construct a complete CBE XML record.
The parser may extract some fields from the log file directly, such as a time stamp, host name, daemon name, and a text message. The parser may also infer data from a record. For example, the parser may detect that the record originated with a software service and set the CBE componentIdType attribute to ServiceName. In other instances, the parser may add data to a record. In particular, if a log entry doesn't record the day, month, year, time, and time zone of the event, the parser must add that data to create a valid CBE.
To put the parser for the daemon.log example in perspective, Listing 15 shows a valid CBE XML record for the log entry Mar 2 06:27:35 db popa3d[7964]: Session from 71.65.224.25. Some of the attributes are plainly derived from the original log entry; others will be manufactured from implied data. (Many of the values of the attributes come from the Common Base Events Specification. It's helpful to use that document while creating your parsers.)
Listing 15. The CBE equivalent of the first record of daemon.log
<CommonBaseEvent
creationTime="2006-03-02T13:27:35.000Z"
globalInstanceId="A1DAABECA2ACB4F0E8E9E8C475042F1B"
msg="Session from 71.65.224.25"
version="1.0.1">
<sourceComponentId
component="popa3d"
componentIdType="ServiceName"
location="db.linux-mag.com"
locationType="Hostname"
subComponent="7964"
componentType="daemon"
/>
<situation
categoryName="StartSituation">
<situationType
xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
xsi:type="StartSituation"
reasoningScope="EXTERNAL"
successDisposition="SUCCESSFUL"
situationQualifier="START INITIATED"/>
</situation>
</CommonBaseEvent>
|
Also keep in mind that (at a minimum) every CBE must define the creationTime attribute, the msg attribute, and the sourceComponentId element, which in turn must have the six attributes shown in Listing 15. The situation element (among others) is optional, but is part of the example to elaborate upon the event.
Click Parser in the Generic Log Adapter perspective to begin the process of defining it. Figure 11 shows what the completed parser looks like. There is one parser task for every attribute and element in the CBE shown in Listing 15.
Figure 11. The complete parser for daemon.log
The parser works in two phases. First, it divides the incoming record (from the extractor) into positions, or numbered parts, in which each part is separated from the other by the separator token. If no separator token is specified, this step is skipped. Then the parser divides the record into designations, or (name, value) pairs, in which each (name, value) pair is two strings joined by the designation token. If no designation token is specified, the latter step is skipped.
Consider this example: If the separator token is the regular expression [ ]+, the designation token is = (equal sign), and the parser is handed the record:
03/05/06 12:51:06EST Mail name=joe action=login authentication=password |
the parser would define six positions and three designations, as shown in Table 1.
Table 1. Positions and designations from the parser
| Position/Designation | Value |
|---|---|
| 1 |
03/05/06
|
| 2 |
12:51:06EST
|
| 3 |
Mail
|
| 4 |
name=joe
|
| 5 |
action=login
|
| 6 |
authentication=password
|
| h{'name'} |
joe
|
| h{'action'} |
login
|
| h{'authentication'} |
password
|
Note: If your incoming record begins with the separator token, position 1 is created, but left empty.
You can use all the defined positions and designations to simplify each parser task. For instance, to create the creationTime attribute, you need only parse position 2. Of course, the entire original record is always available. However, positions and designations make each parsing task faster and easier to manage because the source string is smaller. In many cases, you can use a position or designation directly for a CBE value.
Click Parser again. For convenience, break each daemon.log entry into two positions using the separator token :[ ]+ (a colon followed by one or more spaces). The daemon.log log entries don't have (name, value) pairs, so the designator token is omitted. These settings are shown in Figure 12. Now, save your work.
Figure 12. Dividing a record into positions
Set the first required field in the CBE: creationTime. The goal is to transform the time stamp provided with the daemon.log record into a time format compatible with the XML schema dateTime data type. As a convenience, the adapter can automatically permute a time format understood by class
java.text.SimpleDateFormat
into the XML schema data type.
To set the creationTime field, complete these steps:
- Expand the parser and select creationTime. This is a required CBE attribute, so select the Required by parent check box.
- Click the substitution rule associated with creationTime.
- For Positions, type
1because position 1 contains the time stamp to extract. - For Match, provide the regular expression
^(\w{3})\s+(\d{1,2})\s+([\d:]+)\s+.*$. This expression captures the month name as$1, the day of the month as$2, and the time of day as$3. - For Substitute, supply
$1 $2 @YEAR $3 @TIMEZONE.
Substitute is used instead of the entire incoming record in the rest of this specific parsing task.$1,$2, and$3came from the previous step. However, because the time stamp doesn't include a year or a time zone, the year and time zone associated with the current context instance, represented by the shorthand@YEARand@TIMEZONE, respectively, are used instead. Therefore, for the first daemon.log record, the settings in Substitute yield the stringMar 02 2006 06:27:35 -0700. - Ignoring the Substitute extension class field, which allows you to provide a Java class to do additional substitutions, transform the result of the substitution to the right type. You can use a
java.text.SimpleDateFormatformat string to do the heavy lifting. Set Time format toMMM dd yyyy hh:mm:ss Z, indicating a three-letter name of the month; a two-digit day of the month; a four-digit year; hours, minutes, and seconds separated by colons; and an RFC 822 time zone.
Figure 13 shows the final settings for creationTime. If you save the configuration file and rerun the adapter, the Formatter Result pane should show a new XML record with attribute creationTime="2006-03-02T13:27:35.000Z".
Figure 13. Parsing the incoming time stamp into the creationTime attribute
The msg attribute is another required CBE attribute. Add this attribute and create the parser task to extract a suitable value:
- Right-click CommonBaseEvent, then click Add > msg.
- Click msg, then select the Required by parent check box.
- Expand msg, then click Substitution Rule.
- Specify
2in the Positions field because the message portion of the log entry is located in position 2. (It's everything after the separator token.) - For Match, specify a regular expression that selects the entire string. The regular expression
^(.*)$captures everything in$1. - For Substitute, specify
$1.
Figure 14 shows the final settings.
Figure 14. Settings to extract the message
Save the configuration file and click Rerun adapter, found in the Extractor Result pane. Click Next event and switch to the Formatter Result pane. You should see a new msg attribute that looks like msg="Session from 71.65.224.25".
The last mandatory part of a CBE record is the sourceComponentId, used to record the component (service, system, and so on) that's affected by the event. In the instance of daemon.log, the components affected are software services running on a specific host. The parser's job is to capture and record the specifics.
Right-click CommonBaseEvent once again, and then click Add > sourceComponentId. (Figure 15 shows all the possible attributes and elements you can add to a CBE.) For brevity, Table 2 shows all the settings required for sourceComponentId. One new setting is Default value. If a match is made by a parsing rule, but no substitute value is provided, the Default value is used.
Figure 15. List of elements and attributes you can add to a CBE record
Table 2. Settings for the sourceComponentId
| Item | Default value | Required by parent | Positions | Match | Substitute | Notes |
|---|---|---|---|---|---|---|
| component | Yes | 1 |
^.* db (\w+)\[.*$
|
$1
| Captures the name of the software service, such as pop3ad or mysqld. | |
| componentIdType |
ServiceName
| Yes |
^(.*)
| Indicates that the component records the name of a service; ServiceName is one of the prescribed values for this attribute, according to the CBE specification. | ||
| componentType |
daemon
| Yes |
^(.*)
| Describes the class of the component. | ||
| location |
db.linux-mag.com
| Yes |
^(.*)
| Specifies the physical address that corresponds to the location of a component. The format of the value of the location is specified by the locationType property. It is recommended that you use a fully qualified host name for this attribute. Here, because the log entry does not include a host name, one is added via the default value. In other cases, you may be able to parse the host name directly from the log. | ||
| locationType |
Hostname
| Yes | 1 |
^(.*)
| Specifies the format and meaning of the value in the location property. The Hostname keyword is one of many possible keywords that you can use here. | |
| subComponent | Yes |
^.*\[(\d+)\].*
|
$1
| Identifies the specific daemon process that the event affects. |
If you make all the changes listed in Table 2, and save and rerun the adapter, you should yield CBE event records that resemble Listing 15. As an additional exercise, add a
situation
to the CBE. Situations categorize the type of situation that initiated the event. For instance, you might create a parser to create a StartSituation whenever the daemon is initially contacted for service or create another parser to create a RequestSituation when a request is made.
Situations aren't required (hence, Required by parent can be disabled), but you may find them useful to add granularity to your CBE records. If you create a situation and add a series of possible situation parsers, select the Child choice check box if processing can stop after the first match is made.
Here's a helpful tip for debugging your parsers: If a property is required, but not found in the incoming record (passed to the parser from the extractor), the Formatter Result pane for that record will be empty. In other words, required properties behave like logical AND: If one match fails, processing for that record stops. It's often useful to clear the Required by parent check box to debug rules. Build your rules slowly and incrementally, and watch the Problem pane for clues.



