 | Level: Introductory Bob Moore (remoore@us.ibm.com), Advisory Software Engineer, IBM Brad Topol (btopol@us.ibm.com), Senior Software Engineer, IBM Jie Xing (jiexing@us.ibm.com), Advisory Software Engineer, IBM
12 Jul 2005 Updated 14 Mar 2006 Simplify problem diagnosis with a key feature of the Automated Problem Determination (AutoPD) tool. In this article, find out how the tool can scan large log files to identify and extract specific log records, helping you to more easily diagnose a particular problem for IBM WebSphere® Portal or WebSphere Application Server. In this article, you review the overall architecture for the tool's symptom analysis function and get a detailed description of how to write a symptom specification for a product that emits non-XML-formatted log records.
Introduction
This article continues the series of articles on the Automated Problem Determination (AutoPD) Tool. Part 1 of the series introduces the tool, and Part 2 shows you how to extend the tool to address additional products and problem scenarios. In this article, you review the tool's symptom analysis function, which automates what IBM support personnel do when they diagnose a problem. A companion article, "The Automated Problem Determination Tool: Symptom analysis with XML-formatted log files," completes the symptom analysis discussion by describing how to write a symptom specification for a product that emits XML-formatted log records, including ones that comply with the XML-based Common Base Event format defined by autonomic computing technology.
Faced with several potentially large product log files, and knowing the general category of the problem they're dealing with, IBM support experts use their experience with the product to scan the log files for specific log records most likely to help them identify the underlying cause of the problem. This activity can be automated because the scans are based on simple pattern matching. For example, an expert might know that for a login problem, the key is to look for log records containing error codes, where an error code is always of a predictable form such as four letters + four digits + 'E'.
For a given problem category, the rules identifying how to scan a log file when dealing with a problem of that type are typically unique to a specific release of a specific product. This is not surprising. Products use different formats for their log files, so even the task of separating the individual log records within a log file is different for different products. The structure inside a log record also varies from product to product. Finally, different products may "say the same thing" in their log records using slightly different wording or formats.
Symptom analysis can also vary across releases of a single product. If a problem of some type is difficult to diagnose in version 1 of a product, the product may add additional logging in version 2 specifically for diagnosing that problem. Scanning for the newly added log records would be pointless for version 1 of the product, but exactly what's needed for version 2.
Finally, for a specific release of a specific product, the scanning criteria will ordinarily vary based on which log file is being scanned. IBM support experts know to look for one pattern in SystemOut.log and for another pattern in trace.log, because they know from experience where that release of that product writes log records containing each of these patterns.
Putting it all together, when providing the tool with a symptom specification for extracting log records likely to be useful for diagnosing a problem, the specification is unique to a combination of the following four variables:
- The type of problem to be diagnosed
- The name of the product involved
- The version of the product involved
- The specific log file or files to be scanned
You will see below how to use the XML elements and attributes in an Ant script and in a pattern document to represent these four items, which we will refer to as the scoping variables for a symptom analysis activity.
Setting the scoping variables in an Ant script
A pattern document always may, and typically does, contain multiple symptom specifications. For example, the pattern document included with the AutoPD tool itself, pattern_template.xml, contains symptom specifications for all of the WebSphere Portal problems and product versions that the tool supports. Consequently, when an Ant script invokes symptom analysis, it must supply enough information to tell the tool which symptom specification to use. To see how this is done, let's revisit the custom Ant task <infocollect> that we first examined in our second article, shown here in Listing 1:
Listing 1. The <infocollect> task
<infocollect
problem="${infocollectProblemType}"
patternFile="${autopdproperties}/pattern_template.xml"
levelreport="${autopdtmp}/autopd/levelreport.html"
autopdreport="${autopdtmp}/autopd/autopd_analysis_report.html"
productname="${product.name}"
productversion="${product.version}"
autopdname="${autopd_name}"
autopdversion="${autopd_version}" >
<autopdfileset filesetName="wpslog" filesetDir="${portal.root}/log" />
<autopdfileset filesetName="tracelog" filesetDir="${trace.log.file}" />
<autopdfileset filesetName="systemoutlog" filesetDir="${systemout.log.file}" />
<autopdfileset filesetName="systemerrlog" filesetDir="${systemerr.log.file}" />
</infocollect>
|
The four scoping variables listed above are present within the <infocollect> task:
- The
problem attribute identifies the type of problem being diagnosed. As we explained in Part 2, the value ${infocollectProblemType}, is a script-level parameter which is set to an actual problem type by the collection script that invokes this <infocollect> task. This problem type value will match a value in the pattern document identified by the patternFile attribute.
- The
productname and productversion attributes identify the exact product for which the diagnosis is being performed. Once again, the values of these attributes will be matched against values in the indicated pattern document. As we discussed in the second article, the values of these two attributes are in this case Ant properties, whose values were assigned by the custom Ant task <wpsversion>. But these attributes can get their values in other ways:
- A value can simply be hard-coded in the script, either directly in the
<infocollect> task attribute itself, or in an Ant property that gets assigned to the attribute. In the tool's [WebSphere] Portal Problem Analysis script, for example, the product.name property is hard-coded with the value 'IBM WebSphere Portal Server', and then assigned to the productname attribute. Why is this value hard-coded? Because it occurs in the Portal Problem Analysis script.
- The user can be prompted for an attribute's value. The Portal Problem Analysis script simply asks the user which version of WebSphere Portal the collected set of log files came from, because there's no easy way to deduce this information from the log files themselves.
- The script can search for other clues such as the existence of certain files or directories, and use these clues as a basis for deciding which product and version it's dealing with.
- One or more child elements
<autopdfileset> provide the fourth piece of scoping information, the file or files on which the analysis will be performed. <autopdfileset> is a custom Ant task itself, but it doesn't actually do anything. It simply serves as a vehicle for passing the file set information to the <infocollect> task.
Each <autopdfileset> element has two required attributes. The attribute filesetName contains a value that will match a value in the pattern document, just as the problem, productname, and productversion attributes did. The attribute filesetDir provides an absolute pointer to the directory where the set of files to be analyzed reside, on the system where the tool is being run. Selection of which specific files to analyze within this directory is handled using an attribute in the pattern file, so it's fine for there to be extraneous files present there.
As a number of the WebSphere Portal scripts illustrate, it may take several steps to determine what value to assign to a filesetDir attribute. With many products (including WebSphere Portal), users are free to relocate or rename log files, so tracking them down can take some time. After they have been located, however, the process of passing the location to the <infocollect> task is very straightforward.
The <autopdfileset> element also supports three optional attributes not shown in the example above. The first of these optional attributes is filterByLatestTime. This Boolean attribute defaults to false. When it is set to true, it instructs the tool to perform analysis only on the newest file in the fileset, and to ignore any other files that it would have otherwise processed. This capability is useful when a product saves several time-stamped snapshots of the same underlying log file, all in a common directory. In a case such as this, no additional information is gained from analyzing the older snapshots, beyond what's gained from analyzing the latest one.
The remaining two optional attributes of the <autopdfileset> element control how the files included in the fileset are processed, as opposed to which files get included in the fileset. The attributes startTime and endTime, which may be included singly or together, determine which log records in the files comprising the fileset will be analyzed. These attributes are applied to all the files included in the fileset. The values of these two attributes must be specified using the format identified by the timepattern property, in the /properties/autopd.properties file.
 |
Using the scoping variables in a pattern document
We have said that the choice of which symptom specification to use from among those present in a pattern document is a function of four variables: problem type, product name, product version, and fileset name. So a pattern document could be structured as (an XML representation of) a single table with four indexes and one additional piece of data:
(Problem, ProductName, ProductVersion, Fileset, SymptomSpecification)
However, doing it this way results in a great deal of needless repetition, because symptom specifications tend to stay the same across versions of the same product, often across different problems for a given product, and often for different filesets for a given product. It is less common, but not unheard of, for symptom specifications to be shared among different products. This situation will change dramatically, however, when products move to the standard Common Base Event format for their log records.
The best way to understand the actual structure of a pattern document is to see how the AutoPD tool goes about using the four scoping variables that were passed to it through the <infocollect> task. As you read the following explanation, it will be helpful to have open in front of you the tool's pattern document specifying how it should perform analysis for WebSphere Portal problems. This document is named pattern_template.xml; you can find it in the /properties/wps directory. We will, however, reproduce each specific element of the document as we discuss it, so it is not absolutely necessary to have a copy as you read this article.
Product name and product version
The first scoping variables that the tool uses are productname and productversion. These two variables serve as a two-part key into the collection of all the <productinfo> elements in the indicated pattern document. Listing 2, for example, shows four of the <productinfo> elements present in the pattern_template.xml document:
Listing 2. Four of the <productinfo> elements from the pattern_template.xml document
<productinfo analysisProfileRef="wps_portal_5.1"
name="IBM WebSphere Portal Server"
version="5.1.0.1">
<msgPrefixMappingInfo
catalogDir="messagecatalog/portal_message_catalog_5101"
fileName="properties/wps5101msgprefixmapping.properties"
mappingType="PortalCatalogMessage"
keyPattern="([A-Z]{4,5}[0-9]{1,4}[EW])"/>
</productinfo>
<productinfo analysisProfileRef="wps_portal_5.1"
name="IBM WebSphere Portal Server"
version="5.1.0.0">
<msgPrefixMappingInfo
catalogDir="messagecatalog/portal_message_catalog_510"
fileName="properties/wps5100msgprefixmapping.properties"
mappingType="PortalCatalogMessage"
keyPattern="([A-Z]{4,5}[0-9]{1,4}[EW])"/>
</productinfo>
<productinfo analysisProfileRef="wps_portal_5.1"
name="IBM WebSphere Portal Server"
version="5.1"/>
<productinfo analysisProfileRef="wps_portal_5.0"
name="IBM WebSphere Portal Server"
version="5.0"/>
|
You can ignore the child element <msgPrefixMappingInfo> that appears beneath two of the <productinfo> elements. This child element is related to the Message Catalog Lookup function for WebSphere Portal, described in the tool's user's guide. This function is not something that you can extend to other products simply by editing the pattern document, because the tool would have to be extended to include the additional catalogs themselves. Currently, the tool is not extensible in this way, although it is possible that the ability to add user-supplied message catalogs is something that might be added to the tool in the future. For now, though, you will not include the <msgPrefixMappingInfo> element below any of your <productinfo> elements.
Starting with the productname and productversion values supplied to it in the script through the <infocollect> task, the tool selects the one <productinfo> element, if any, that best matches them. For product name, a case-insensitive exact match is performed. For product version, however, the tool performs a longest-prefix match of the value supplied to it in the productversion attribute against the values in the version attributes of all the <productinfo> elements that matched on product name. If we assume that the value of productname is "IBM WebSphere Portal Server," then the longest-prefix match for product version will have the following results if the pattern document contains the four <productinfo> elements shown above:
productversion = "5.1.0.1" selects the first <productinfo> element.
productversion = "5.1.0.0" selects the second <productinfo> element.
productversion = "5.1.0.2" selects the third <productinfo> element, because of the longest-prefix match.
productversion = "5.1.1.1" selects the third <productinfo> element, again because of the longest-prefix match.
productversion = "5.2.0.0" does not select any <productinfo> element. In this case, the script fails.
The purpose of selecting a <productinfo> element is to get access to its analysisProfileRef attribute value. This value will be used in the next step of the process that selects a symptom specification.
Problem type
Different types of problems will naturally have different symptoms. This is why the <infocollect> task includes a problem attribute as a third scoping variable. A pattern document will contain a separate <problem> element for each type of problem it addresses. Listing 3 shows the <problem> element for the Portal Login problem from the pattern_template.xml document.
Listing 3.
A <problem> element from the pattern_template.xml document
<problem name="portallogin" description="Portal Login Problem">
<analysisProfile name="wps_portal_5.1" >
<fileset name="wpslog" value="(wps[_.].*)" >
<delimiterid id="delimiter1" />
</fileset>
<fileset name="tracelog" value="(trace.*)" >
<delimiterid id="delimiter2" />
</fileset>
<fileset name="systemoutlog" value="SystemOut.*\.log" >
<delimiterid id="delimiter4" />
</fileset>
<fileset name="systemerrlog" value="SystemErr.*\.log" >
<delimiterid id="delimiter4" />
</fileset>
<fileset name="configtrace" value="ConfigTrace\.log" >
<delimiterid id="configtracedelim" />
</fileset>
</analysisProfile>
<analysisProfile name="wps_portal_5.1" >
<fileset name="wpslog" value="(wps[_.].*)" >
<delimiterid id="delimiter1" />
</fileset>
<fileset name="tracelog" value="(trace.*)" >
<delimiterid id="delimiter2" />
</fileset>
<fileset name="systemoutlog" value="SystemOut.*\.log" >
<delimiterid id="delimiter4" />
</fileset>
<fileset name="systemerrlog" value="SystemErr.*\.log" >
<delimiterid id="delimiter4" />
</fileset>
<fileset name="configtrace" value="ConfigTrace\.log" >
<delimiterid id="configtracedelim" />
</fileset>
</analysisProfile>
</problem>
|
This <problem> element contains two child <analysisProfile> elements. So the combination of the problem value from the <infocollect> task and the analysisProfileRef identified earlier together select a single analysis profile.
The two analysis profiles for the Portal Login problem type, named wps_portal_5.1 and wps_portal_5.0, happen to be identical to each other. This reflects the fact that nothing related to the logging of information for Portal Login problems changed between these two releases of WebSphere Portal, at least nothing that would matter to the type of symptom analysis the tool performs for these problems. Because they are identical, it would have been possible to collapse these two analysis profiles into one profile, pointed to by the analysisProfileRef attributes for both major versions of WebSphere Portal. Keeping them separate, however, makes it easier to modify the document if it becomes necessary to have different symptom specifications for the two releases.
Fileset
At this point, the first three scoping variables have selected a particular <analysisProfile> element in the pattern document, such as the one shown in Listing 4.
Listing 4.
An <analysisProfile> element from the pattern_template.xml document
<analysisProfile name="wps_portal_5.1" >
<fileset name="wpslog" value="(wps[_.].*)" >
<delimiterid id="delimiter1" />
</fileset>
<fileset name="tracelog" value="(trace.*)" >
<delimiterid id="delimiter2" />
</fileset>
<fileset name="systemoutlog" value="SystemOut.*\.log" >
<delimiterid id="delimiter4" />
</fileset>
<fileset name="systemerrlog" value="SystemErr.*\.log" >
<delimiterid id="delimiter4" />
</fileset>
<fileset name="configtrace" value="ConfigTrace\.log" >
<delimiterid id="configtracedelim" />
</fileset>
</analysisProfile>
|
Now the fourth set of scoping variables comes into play, the <autopdfileset> tasks from the Ant script. Here's the first of these tasks:
<autopdfileset filesetName="wpslog" filesetDir="${portal.root}/log" />
As shown in Listing 5, the filesetName attribute value wpslog in this <autopdfileset> task corresponds to the name attribute value of this <fileset> child of the selected analysis profile:
Listing 5.
The <fileset> element wpslog
<fileset name="wpslog" value="(wps[_.].*)" >
<delimiterid id="delimiter1" />
</fileset>
|
The tool now has enough information to determine what symptom analysis to perform for this fileset, and which files to perform it on. The symptom analysis to perform is specified in the <delimiter> element delimiter1 referred to by the <delimiterid> child element for the fileset. The set of files to perform it on is all files in the directory location specified in the filesetDir attribute from the Ant script whose file names match the regular expression (wps[_.].* specified in the pattern document; that is, all files having file names starting with either "wps_" or "wps.".
The tool follows the same process for each of the other <autopdfileset> tasks. If any of the filesets fails to select any files to be analyzed, it is skipped, and analysis proceeds for the other filesets.
Specifying symptoms
At this point, the scoping variables have done their job: identifying for the tool which files to perform symptom analysis on, and selecting a <delimiter> element that will instruct it how to perform the analysis. We will now examine the <delimiter> element itself in some detail.
Separating log records in a log file
First, we need to step back and review what symptom analysis is all about: extracting from a log file those log records most likely to be useful in diagnosing a particular type of problem. Before the tool can make a distinction between those log records worth extracting for further examination and those not worth extracting, it has to be able to separate the contents of a log file into individual log records. How does it do this? It can't simply rely on newline characters, because log records may extend to multiple lines.
In fact, this is the first role of the <delimiter> element (and the role that gave it its name): telling the tool how to separate out the individual log records in a log file. Listing 6 shows the <delimiter> element named delimiter1 from the tool's pattern document:
Listing 6.
A <delimiter> element from the pattern_template.xml document
<delimiter
id ="delimiter1"
value="([0-9]{1,4}\.[0-9]{1,2}\.[0-9]{1,2}\ [0-9]{1,2}:[0-9]{1,2}:[0-9]{1,2}
\.[0-9]{1,3})" >
...
</delimiter>
|
The value attribute in a <delimiter> element always contains a regular expression matching a pattern that indicates the beginning of a new log record. (We can't provide a full tutorial on regular expressions here; see Resources for links to some good references.) In this case, the pattern is a timestamp such as 2005.06.01 18:45:30.222. Whenever the tool encounters a timestamp matching this pattern, it knows that it has moved on to the next log record. This isn't true globally, however; only for symptom analysis activities where the four scoping variables have led the tool to delimiter1. Other <delimiter> elements may have different regular expression patterns to identify the start of a new log record.
Why is this process based on a pattern signifying the beginning of a log record, rather than on one signifying the end of a log record, or one that falls between successive log records? Because this is how log records created by WebSphere Portal (and by WebSphere Application Server, for that matter) are formatted. These log records always begin with a timestamp that's formatted in a particular way (although that way may be different for different log files, but multiple <delimiter> elements deal with that situation), but they can end with anything. And there's nothing that separates one log record from the next one except for a newline character, which can also appear inside a log record.
Applying the regular expression in a <delimiter> element's value attribute to the beginning of a log record is built into the AutoPD tool's implementation, so you can't vary this behavior. Fortunately, most log files that we are aware of follow the convention of starting out each of their log records with a predictable pattern, so this hard-coded aspect of the tool has not proven to be a limitation in practice.
Selecting individual log records and sub-records
After the tool has identified an individual log record, it must decide whether that log record meets the criteria for extraction and highlighting. These criteria are expressed in one or more child <pattern> elements within the same <delimiter> element that separated the log file into individual log records, shown in Listing 7:
Listing 7. A <delimiter> element includes one or more <pattern> elements
<delimiter
id ="delimiter1"
value="([0-9]{1,4}\.[0-9]{1,2}\.[0-9]{1,2}\ [0-9]{1,2}:[0-9]{1,2}:[0-9]{1,2}
\.[0-9]{1,3})" >
<pattern value="([0-9]{1,4}\.[0-9]{1,2}\.[0-9]{1,2}\ [0-9]{1,2}:
[0-9]{1,2}:[0-9]{1,2}\.[0-9]{1,3})(.*?)([A-Z]{4,5}[0-9]{1,4}
[EW]):(.*)" >
<group name="timestamp" number="1" />
<group name="errorID" number="3" />
<group name="description" number="0" />
<formatref outputName="html-1" formatName="format1" />
</pattern>
</delimiter>
|
In this example, there is only one child <pattern> element. If there had been several, each would have been applied to each of the log records in the file, and a log record matching any of them would have qualified for extraction.
The regular expression in a <pattern> element actually performs two distinct tasks:
- It qualifies entire log records for extraction, depending upon whether they match the entire regular expression.
- It subdivides a log record into smaller subrecords, to better highlight them in the analysis report.
In our example, the entire regular expression selects log records that start with a timestamp in the correct format for this log file (but because they all do, this by itself doesn't provide any selection), and that also contain a string matching the pattern ([A-Z]{4,5}[0-9]{1,4}[EW]):. This is a WebSphere Portal Error or Warning message identifier, consisting of four or five letters, followed by one to four digits, followed by an 'E' or a 'W', followed by a colon.
The pattern ([A-Z]{4,5}[0-9]{1,4}[EW]): represents the predominant approach that the AutoPD Tool has taken to log record selection: for all problem types, extract all the error and warning messages in a log file. It would certainly be possible to refine these selection criteria, to look for a specific error message (or two, or three) for a given problem type. But this runs the risk of excluding a message that might be important for diagnosing a problem. If you're writing new patterns for a new product or problem type, we recommend that you start out with a few broad patterns similar to this one, and then refine them later if that proves to be necessary.
Subdividing the log record into subrecords is accomplished using the parentheses, which serve as metacharacters in the regular expression. There are four pairs of parentheses in this example, which identify four pieces of the log record:
([0-9]{1,4}\.[0-9]{1,2}\.[0-9]{1,2}\ [0-9]{1,2}:[0-9]{1,2}:[0-9]{1,2}\.[0-9]{1,3}) The timestamp for the log record.
(.*?) Everything up to but not including the first error or warning message identifier in the log record. The question mark is important here, to force a non-greedy match with the "all-match" pattern .*. Without the question mark, the parenthesized expression would have matched everything up to but not including the last error or warning message identifier, which is not the desired effect.
([A-Z]{4,5}[0-9]{1,4}[EW]) The first error or warning message identifier itself.
(.*) The remainder of the log record after the colon that follows the message identifier.
Not everything in the log record must fall within a pair of parentheses: the colon in this regular expression, for example, does not fall within parentheses. It still plays a role in the overall matching of a log record; that is, a log record that lacks a colon after its first message identifier will not qualify. It just isn't included in any of the highlighted subrecords.
The sets of parentheses in the regular expression are given numbers (1 through 4 in this example), so that they can be referred to later by the child <group> elements underneath the <pattern> element. The number 0 is assigned to the entire log record.
Formatting the analysis report
All that remains is for the tool to build the analysis report containing the log records it has extracted from the files that it examined in the different filesets. Instructions for what to include in the report, and how to format it, are also provided in the pattern document. These instructions begin with the child <group> elements under a <pattern> element (shown in Listing 8, which assign display names to some or all of the numbered subrecords identified by the sets of parentheses in the <pattern> element's regular expression. In the example we've been working with, there are three <group> elements:
Listing 8.
The <group> elements in our example
<group name="timestamp" number="1" />
<group name="errorID" number="3" />
<group name="description" number="0" />
|
These elements assign names to two of the subrecords and to the entire log record. These names will appear in the analysis report, as headings for the table columns where the corresponding subrecords appear.
A <pattern> element has one more child element:
<formatref outputName="html-1" formatName="format1" />
The <formatref> element provides a two-part pointer to a <format> element containing instructions for formatting the table in the analysis report for the particular log file being processed. Currently, the only output format that the tool supports is HTML, so all <format> elements are contained within an HTML <output> element. In the future, however, support may be added for other output formats such as XML.
A <format> element contains an order attribute specifying which pieces of information from each selected log record to include in the analysis report, and the order in which to include them. Listing 9 shows two of the <format> elements used by the tool:
Listing 9.
Two <format> elements
<format name="format1" order="timestamp|errorID|description" >
...
</format>
<format name="format2" order="timestamp|exception" />
|
In conjunction with <pattern> elements that extract the right fields from a log record and associate the right names with these fields, these elements produce the two tables shown in Figure 1.
Figure 1. The two tables in the analysis report created by the <format> elements
Recommendations
Up to this point, you've reviewed the different elements that make up a pattern document, how they relate to each other, and how they relate to the <infocollect> task in an Ant script that invokes symptom analysis. The following are some general points to keep in mind as you create your own pattern document and your own scripts that interact with it.
- Make sure each of your
<productinfo> name and version values are ones that can actually be set in your scripts. As we explained earlier, there are a variety of strategies for setting these values in a script. Whichever strategy you choose, it must be capable of passing (using the <infocollect> task) the exact strings that appear in your pattern document.
- Take advantage of the longest-prefix matching for product versions. Even if you're creating a solution for version 1.0.1.1 of your product, set the
version attribute in your <productinfo> element to "1.0," or even simply to "1." If subsequent versions of your product require different symptom analysis, you can always insert additional <productinfo> elements. But you might get lucky, and find that your original solution will work just fine with version 1.0.2 of your product. In this case, you won't have to make any change at all to your pattern document.
- Check carefully for dangling references in your pattern document. As we've discussed at some length, when the tool is getting set up to perform symptom analysis, it moves repeatedly between different elements in the pattern document it's working with. If any of these transitions fails, the tool has no alternative but to terminate the script. So take a few minutes to make sure that all of your
analysisProfileRef attributes point to actual <analysisProfile> elements, all of your <delimiterid> elements point to actual <delimiter> elements, and so on.
- Start with broad, simple patterns, and then refine them if you need to. In most cases, you'll get a huge reduction in volume simply by extracting all records of a particular type from a log file: all error messages, all build reports, and so on. If you find that your analysis reports are still too large, then you can look at introducing more restrictive patterns. You also have the option of restricting the amount of text that's included in a table cell in the analysis report. We don't cover this capability in this article; see "Controlling the Information Reported for a Group" in the tool's user's guide.
- If you include multiple
<pattern> children under a single <delimiter> element, they should all point to the same <format> element. The log records selected by these patterns will all go into the same table, so they should be formatted consistently.
- Set up your formats to highlight the important information in the log records you extract. For errors and warnings, we have found the format pattern
timestamp|errorID|description to be the best choice, based on input from IBM support experts.
- Always include in the collection zip file all the log files on which you perform symptom analysis. Technically, the two main functions of the AutoPD tool are totally independent of each other; other than the analysis report itself, it's possible for there to be no overlap whatsoever between the set of files collected in the zip file FTPed to IBM support and the set of files against which symptom analysis is performed. This is not, however, a good idea.
When symptom analysis extracts a log record to include in the analysis report, it does this in order to highlight that particular log record to IBM support, because past experience has shown that log records of that type are often useful for diagnosing the type of problem under investigation. This, however, is only the first step. IBM support may need to see this log record in its original context; for example, see the log records that came immediately before and immediately after it. Therefore, we recommend that you include among the files selected for the zip file at least the set of files that are fed through the symptom analysis function, so that IBM support will be able to view in their original context any log records that symptom analysis selects for the analysis report.
 |
Summary
In this article, we have examined in detail how to use the elements in a pattern document to control the AutoPD tool's symptom analysis. This process is complicated, but much of the complication comes from the reality that the tool was designed to address; different products, and even successive releases of the same product, format their logs in very different ways.
The situation will improve dramatically as more and more products adopt the Common Base Event standard for their log entries. Not only will the different products be converging on a single log format. Because that format is based on an XML schema, the task of parsing the log records and extracting pieces of information from them will be greatly simplified. The AutoPD tool already supports symptom analysis for XML-formatted log records, including those conforming to the Common Base Event format. The next article in this series, "The Automated Problem Determination Tool: Symptom analysis with XML-formatted log files," discusses the elements in a pattern document that control this symptom analysis.
Download
Resources Learn
Get products and technologies
Discuss
About the authors  | 
|  | Bob Moore is an Advisory Software Engineer with the Software Group Advanced Design and Technology team at IBM in Research Triangle Park, North Carolina. He received his Ph.D. in Philosophy from Duke University in 1977. Since joining IBM in 1983, he has worked on numerous architectures and standards related to network and systems management, including SNA/Management Services, CMIP, SNMP, and DMTF CIM. You can contact Bob at remoore@us.ibm.com. |
 | 
|  | Brad Topol is a Senior Software Engineer with the Software Group Advanced Design and Technology team at IBM in Research Triangle Park, North Carolina. He received a Ph.D. degree in Computer Science from the Georgia Institute of Technology in 1998. Currently, he is the development lead for the Automated Problem Determination Serviceability Tool. Over the years, Brad has been actively involved in advanced technology projects in the areas of autonomic computing, Web services, grid computing, Web content transformation, and aspect-oriented programming. You can contact Brad at btopol@us.ibm.com. |
 | 
|  | Jie Xing is an Advisory Software Engineer with the Software Group Advanced Design and Technology team at IBM in Research Triangle Park, North Carolina. He is involved in advanced technology projects in the areas of Web services, grid computing, and autonomic computing. He received his Ph.D. in Operations Research in Computer Science from North Carolina State University in 2001, where his research areas were related to multiagent systems, distributed systems, and service composition. You can contact Jie at jiexing@us.ibm.com. |
Rate this page
|  |