IBM Support

Best Practices To Define Format File for Log File Agent.

Technical Blog Post


Abstract

Best Practices To Define Format File for Log File Agent.

Body

There are several potential issues surrounding regular expressions and performance, and this entry discuss about the best practices to deal with format file for Log File Agent (which will use regular expressions) to improve the performance.

Regular expression processing is processor intensive , so the primary goal for improving performance is to minimize the time spent checking records from the monitoring source log against regular expressions.

There are several ways of achieving this :

1. Minimize the use of multi-line patterns.

When you use a multi-line pattern (a regular expression containing the '\n' character, or a TEC-style format containing the '%n' token), the agent has to first break the monitored file up into records of various sizes, and then check the records against the expressions in the format file.

This requires checking the regular expressions twice, so it's slower.

If you don't use multi-line, this doesn't have to happen. It just assumes that each line of the file is a record.

Sometimes, of course, you have to use multi-line, because that's just how the data is. Such is life. But understand that this is slower than single-line.

In some cases, it may be possible to just ignore some of the lines and achieve better performance.

For example, the following is logically a single record from a RAS1 trace log:

(4D66DACB.0001-1:RAS1,400,"CTBLD")                                                                                                                                                                                                                                                                                                   

+4D66DACB.0001 Component: ira                                                                                                                                                                                                                                                                                                           

+4D66DACB.0001 Driver: agent_fac:15/4114877.7                                                                                                                                                                                                                                                                                   

+4D66DACB.0001 Timestamp: Feb 24 2011 13:18:54                                                                                                                                                                                                                                                                           

+4D66DACB.0001 Target: sos510amdx6-d

But perhaps all we really care about is the number in the middle line.

So we could write a single line pattern like:  ^\+.*Driver: agent_fac:([0-9\.\/]+)$

That gets us the important value without requiring multi-line.

The other four lines of the logical record are then just treated as single-line records unto themselves, which don't match anything and are discarded.

2. Sort the expressions in the format file by frequency of occurrence in the monitoring log.

For each record that the agent reads from the log, it checks it against the expressions in the format file until it finds a match.

It starts with the last expression in the file, and searches upwards.

When it finds a match, it stops searching.

So, if the most commonly hit expression is listed last, then whenever that expression is hit, it will be the only one that has to be checked.

But if you have 100 expressions in the format file, every time a log record matches the top one listed in the format file, the agent has to first check the other 99 expressions, which will slow it down.

When a record read from the log doesn't match any of the patterns in the format file, the agent has to check it against all of the patterns before it knows that it didn't match. This is expensive.

In this situation, performance can be improved greatly by putting a regular expression near the bottom of the format file (where it will be checked first) that matches these unwanted records. That way each time they occur they will match early and processing will stop, rather than requiring that they be checked against every expression in the format file.

Use the special “*DISCARD*” class for these. When the class name in the format file is “*DISCARD*”, anything that matches is discarded. No event will be generated for it.

Example:                                                                                                                                                                                                                                                                                                                                                                    

REGEX *DISCARD*                                                                                                                                                                                                                                                                                                                                                

^.*Some common uninteresting message.*$                                                                                                                                                                                                                                                                                                        

END

To help find these common records, use the UnmatchLog.

In the .conf file, set e.g.: UnmatchLog=/tmp/unmatch.log

This will cause any records read from the customer's log which did not match any of the expressions in the format file to be written to the specified file.

This file can then be examined manually to see if there are any very common records that can be caught early and discarded to save processing time.

This is a diagnostic setting, it is likely undesirable to leave the unmatch log on all the time, as it will consume processor time and disk space. The unmatch log is not automatically trimmed. It will just keep growing.

3. Include as much constant data as possible in the regular expressions.

Suppose a snippet of data from a log looks like this:

Example:                                                                                                                                                                                                                                                                                                                                                                   

Disk error on device: /dev/sda1                                                                                                                                                                                                                                                                                                                         

Disk error on device: /dev/sdb2

You could write an expression like: ^Disk.*: .*$

It would match, but it forces the regular expression engine to consider more possibilities on other lines that may be similar but ultimately don't match (say if the ':' is missing). Instead: ^Disk error on device: /dev/sd[a-b][0-9]$ is better, because the regex engine can give up earlier on things that don't match, since it knows much more precisely what it is really looking for.

4. Don't use sub-expressions that you don't really need.

Sub-expressions (in parentheses) tell the regex engine that we will want to access the value of some piece of the matched data.

This causes it to do extra processing and this is wasted if we never look at the value.

For example, if log text looks like: write failure in writing to client 9.27.135.191. Error Broken pipe and we write this in the format file:

REGEX WriteFailure                                                                                                                                                                                                                                                                                                                                                

^write failure in writing to client (.*)\. Error (.*)$                                                                                                                                                                                                                                                                                          

ClientAddr $1 CustomSlot1                                                                                                                                                                                                                                                                                                                                     

END

We captured the error message at the end, but did not use it! This will hurt performance.

There may be occasions when you want to use parentheses in an expression for grouping, but not for capture.

In this case, use the ?: operator, which tells the regex engine not to capture the value. This allows the parentheses for grouping without the performance penalty imposed by capturing.

For example, a log snippet may contain:                                                                                                                                                                                                                                                                                                       

login succeeded on the first attempt for user Bob.                                                                                                                                                                                                                                                                                          

login succeeded on the third attempt for user Joe.

We want to match both of these, so we have to allow “first” or “third” (and possibly other values). But we don't actually care which attempt it was. We do care who the user was. So we could write:

REGEX LoginSucceeded                                                                                                                                                                                                                                                                                                                                           

^login succeeded on the (?:[a-z]+) attempt for user ([A-Z][a-z]*)\.$                                                                                                                                                                                                                                                        

UserName $1 CustomSlot1                                                                                                                                                                                                                                                                                                                                   

END

5. Don't use the OR ('|') operator in expressions if you can avoid it.

The | operator will cause the regex engine to back up and try again on things that don't initially match.

This turns out to be dramatically less efficient than simply having two separate expressions.

So instead of :

REGEX DiskError                                                                                                                                                                                                                                                                                                                                                     

^.*disk error.*$|^.*disk failure.*$                                                                                                                                                                                                                                                                                                                         

END


Use:
 

REGEX DiskError                                                                                                                                                                                                                                                                                                                                                     

^.*disk error.*$                                                                                                                                                                                                                                                                                                                                                         

END

 
REGEX DiskError
 
^.*disk failure.*$
 
END
 
This yields the same results but is much more efficient.

Note that this example violated the rule from best practice 3, and used a very vague expression. This is just for simplicity in demonstrating the | issue.

6. Don't use ambiguous expressions.

Will force the regex engine to back up and look for different ways in which an expression might match.

There is a longer explanation of this issue in the “Performance Tips” section of the ICU regex (that's the engine we use) documentation:

   http://userguide.icu-project.org/strings/regexp

It can occur if someone writes an expression meant to break up a long record into many sub-expressions.

A degenerate version of this problem can be illustrated with an expression like this: (.*) (.*)

It looks for two strings of anything separated by a space.

But .* will also match a space, so the regex engine may initially assign the first space it comes to to the first (.*). If it then reaches the end of the input record without finding another space, it has to back up and try again using the space as the literal space called for in the regex.

As the number of such things grows, the processing time required can become exponential (ie, really bad).

Note: There are online tools such as Regex Pal for checking whether the format file the user define will match the monitoring log.

Regex Pal Link: http://regexpal.com/ 

 
Acronyms/Abbreviations:
TEC - Tivoli Enterprice Console 
ICU -  International Components for Unicode

 

[{"Business Unit":{"code":"BU053","label":"Cloud & Data Platform"},"Product":{"code":"","label":""},"Component":"","Platform":[{"code":"","label":""}],"Version":"","Edition":"","Line of Business":{"code":"","label":""}}]

UID

ibm11081995