• No replies
23 Posts

Pinned topic Analyzing WebSphere logs; Slice, Dice, Filter, Merge ... and never parse

‏2011-04-01T03:14:01Z |
This article and attached samples are about using the simple High Performance Extensible Logging (HPEL) API. For most typical analysis, the LogViewer tool in the Admin Console and the LogViewer tool on the commandLine already provide great capabilities and you may never need more. If you do though (or if you are a tool developer at heart), to paraphrase Malcolm and Angus (AC/DC) ... "For those about to code, we salute you!!" I'm not going to dwell too long on the benefit of simple APIs vs nonDeterministic parsing of many differently formatted records et al ... but I am going to provide a sample library with a world of gems, and do a deep dive into various types of samples (one in each form/blog entry). The types of samples we'll cover are:

1. Simple filtered query
2. Simple filtered query with z/OS - z/OS has the controller/servant concept, but we did all we could to make analyzing z/OS logs similar. In samples after this, we'll just show how you would tweak it for z/OS
3. Merge of logs from multiple servers
4. Merge of all logs from a specific z/OS server together (ie: controller and all servants merged into one log
5. Access to remote logs
6. A simple central logging solution
7. Polling/monitoring sample

Beyond that, these are forums (and blogs) ... so I will provide additional feedback, samples, ... as desired. I admit it, I think HPEL is awesome and ground-breaking, and I will do what I can to help people get involved. So feel free to "use me" (er uh, use my enthusiasm) to get me to customize samples et al. Many possible topics (running RepositoryLogRecords thru JSR 47 formatters, advanced filtering, various combinations of this functionality, WebUI type samples, ...) can be discussed, I'll let the audience drive the direction.

So let's start out with the sample library, if you are viewing this on my DW blog, you will have to go to the beta webSite to get to the sample library (which is attached to the entry). Getting involved in the open beta is also a great way to get the code. For anything NOT involving remote, all you need is the small For remote, we use JMX, so you need the admin thinClient (only about 200x the size ... sorry, I cannot control that one).

OK, let's start with some background. You need not be sitting on machine A to view the logs from any server on machine A. There is remote access (via JMX) and more easily, you can just zip up the logs directory, bring it local, and view it there (with the commandLine tool, or with the API). And of course, the Admin Console LogViewer can also view remote logs. We have optimized remote access but ... as you might guess ... if you need to do lots of analysis, it's not a bad idea to just bring a repository local. They're all created with java. so you can view a Z repository on windows, a linux repository on Z, or whatever your heart desires. You can view an active repository while it's being written ... or you can zip a snapShot up, send it elsewhere, and analyze it there.

Definition of terms:
Log Repository - Base directory into which a JVM is logging. For WebSphere, this is usually of the form <ProfileHome>/logs/<serverName>/. This is generally all that needs to be known in order to access any logs locally.
Server Instance - Start and stop of a server one time (one lifecycle). If a server is started 4 times and stopped 3 (ie: currently running), then there are 4 server instances. The "current" server instance is always the latest, regardless of whether or not the server is running
Local Repository - Any repository that can be reached by referring directly to its directory locally (including network mapped drives et al).
Remote Repository - A repository that is accessed via network protocols (default for this is JMX which requires that the server is up). Technically, you can access your local repository via JMX, but other than testing ... not sure why you'd want to
Child process - process which is a logical child of another process from a logging perspective. Prime example is a z/OS servant is a child of the controller. You may see that the repositories are underneath the parent repositories, are generally accessible only through their parent
Merge - The aggregation of logs from several repositories (local or remote) based on time. The aggregation looks much like a ServerInstanceLogRecordList, but there are special techniques for accessing the header associated with a given record (which is helpful since there can be many different headers associated with a merge).
ServerInstanceLogRecordList - the class that represents the logs for a single ServerInstance (this may logically represent many physical files as you'll note we don't want people thinking in terms of physical files). It contains RepositoryLogRecords and headers
RepositoryLogRecord - a class that represents a single log entry. It provides get methods for all info in the record

Build information
As mentioned, samples that do not involve remote access require only the small hpel jar (< .5Mb) to build and run. This covers all filtering, merging, and even some formatter code. For remote samples, you will need the admin thinClient jar that comes with WebSphere V8 (unless you've successfully gotten the JMX packages in the JDK to do the job for you). So to build and run the samples, here are some sample commands:

Building a sample that uses only local repositories:
javac -cp <WasHome>/plugins/ com/ibm/sample/hpel/

Running a sample that uses only local repositories
java -cp <WasHome>/plugins/ com/ibm/sample/hpel/ReaderSamplesForExercises <WasHome>/profiles/<profileDir>/logs/server1/ 1

Building a sample that involves using JMX to access remote repositories
javac -cp <WasHome>/plugins/ com/ibm/sample/hpel/

Running a sample that involves using JMX to access remote repositories
java -cp <WasHome>/plugins/ com/ibm/sample/hpel/ReaderSamplesForExercises <args>

Simple Local Repository Sample
OK, now that you know what it all looks like, let's look at our first sample ... a simple run through the records in the local repositories that have severities between INFO and SEVERE (inclusive). Here is the code (full source including imports and more comments in sample library here):

public class LocalReaderSample {

public static void main(String[] args) {
// Create a repository reader (requires base directory of repository
1 RepositoryReader logRepository = new RepositoryReaderImpl(args[0]) ;
// Get iterator of server instances (start/stop of the server) extracting all log messages with
// severity between INFO and SEVERE. Lots of different filtering options, this is just one sample
2 Iterable<ServerInstanceLogRecordList> repResults = null;
try {
3 repResults = logRepository.getLogLists(Level.INFO, Level.SEVERE);
// Go through each server instance
} catch (LogRepositoryException e) {
System.out.println("Exception reading local repository: "+e);
try {
4 for (ServerInstanceLogRecordList pidRecords: repResults) { // For each list (server lifeCycle)
// For each server instance, go through the records
5 for (RepositoryLogRecord repositoryLogRecord: pidRecords) {
// Just printing some key information here. Note that the repositoryRecord exposes all fields
// with simple get methods
6 System.out.println(" "+repositoryLogRecord.getFormattedMessage());
} catch (LogRepositoryRuntimeException lrre) {
System.out.println("Exception while retrieving data: "+lrre);

I put numbers to the left of the noteworthy lines. The other lines are pretty straight-forward java, the numbered lines focus on the API. As you look at each line:
1. This is a simple line to open the repository. You need only specify the location of the root of the repository (${SERVER_LOG_ROOT} in general)
2. This is going to hold the data. It is an Iterable of lists. Basically, you can iterate through the serverInstances in the repository, and for each you can iterate through the actual log records (you'll later see samples that focused on just the latest instance and avoided the double-iterating ... but this is a good sample that shows how easy it is to see the info and how well organized it is ... so bear with me).
3. Thie getLogLists call does 95% of the work. It includes some simple filter criteria (severity range ... later you'll see a better way to do more advanced filtering) and it gets you access to all of the log records that meet this filter criteria in one line of code. How cool is that!! Oh, and your asking ... why bother with INFO to SEVERE, isn't that SystemOut.log? Well, with HPEL, log and trace are separated, but you don't need to care because we take care of the details for you. If you want log and trace together, just ask, we take care of it. Another thing you may notice missing here ... how do we handle file rollovers et al? Do we just get you from the latest roll? NO!!! (would I even have brought it up if it was not another cool HPELism). We do file rollovers et al just like legacy ... but our API (and our tools) mean you don't need to care. We get you the data, you just ask.
4. This is the first layer of peeling the onion. It goes through the serverInstances one at a time
5. This is the second layer of the onion, for each serverInstance, this goes through the records
6. OK to keep the sample simple, this lamely just gets the formatted message and prints it. But if you pull this into eclipse, you'll see that you can get any info you want simply by calling the appropriate get method on the RepositoryLogRecord. If I want to know the logger (warning here, the logger in the old SystemOut.log is truncated front and back ... this gives you full logger ... take a breath before continuing because if you understand this significance, you will be excited), simply do repositoryLogRecord.getLoggerName(). No parsing, no figuring out which type of record this is ... just ask for the info, and it is served up.

So now you've seen how simple it is to go through all serverInstances for a server and pull records. You'll see lot of variations, but these concepts are needed going forward. You'll see how remote access (when you use helpers to shield you from the jmx gorp) looks amazingly identical to the local access. You'll see merging is so simple you shouldn't even take credit for doing it (but take the credit anyway), you'll see that advanced filtering is as easy as getting the data in the first place. If I hear feedback on this, I can provide sample repositories, explanations, customized examples, et al. I'm here and I'm listening. Talk to me.