Tracing in a multithreaded, multiplatform environment

Use log4j and UNIX commands to mine tracing data

Most Java programmers use some kind of tracing system to keep track of potential errors and problems in code that is under development. However, multithreaded and multiplatform systems can generate a large and baffling amount of tracing data. In this article, software engineer Daniel Would offers some tips that will help you make sense of trace data from complex applications. You'll learn how to use log4j, an open source logging package, to generate information-rich log files. You'll also see how you can use standard UNIX shell commands to mine that data for the information you need.

Share:

Daniel Would (wouldd@uk.ibm.com), Software Engineer, IBM

Daniel Would joined IBM as a computer science graduate, and since then has worked on CICS Transaction Gateway and CICS Transaction Server in system test. He has worked extensively with Java technology and EJB components, within both CICS and WebSphere. He'd like to take the opportunity to say hello to Sara, Scott, Barry, and Paul, who've all supported his writing in one way or another. Contact Daniel at wouldd@uk.ibm.com.



18 March 2004

There are two main elements to successful tracing. The first is outputting enough information to be useful. The second is outputting it in such a way that you can get at the information you need. After all, you could output a mound of data that remains totally unreadable and unsearchable. Thus, this article is divided into two sections: one focused on producing trace data, and one focused on querying the resulting information in specific ways.

You have a couple of options in the log/trace arena. You can, of course, go with the "roll your own" technique; there are also precanned support classes, both proprietary and free, offering a range of functionality. One example of a free implementation is log4j, which is well proven, fast, and versatile. It provides a comprehensive infrastructure and easy-to-understand configuration. Thus, I have chosen it as the base for my examples. In this article, I'll walk you though the use of log4j and apply it to a working example of multithreaded, multiplatform code.

Downloading and installing log4j

Log4j is available from the Apache Logging Services Project page at http://logging.apache.org/log4j/index.html. I used version 1.2.8 for the examples in this article. Once you've downloaded and extracted the package, the only file you need is /jakarta-log4j-1.2.8/dist/lib/log4j-1.2.8.jar. With this file on the classpath of your logging applications, you're set to go.

Details on the log4j license can be found in the file license.txt at the root of the package's file structure. The license essentially allows use and redistribution of source or binary, as-is or modified, providing you continue to provide the license agreement, and do not attempt to use certain terms, such as log4j and Apache, as endorsements without the Apache Foundation's consent.

Log4j is easy to set up: all you need to do is supply the log4j JAR files to each piece of your application and get on with logging (see Downloading and installing log4j). The use of log4j is explained in its documentation, and there are a few articles looking at its basic usage (see Resources for a link). In this article, I will talk about the things that you need to consider for logging in a complex environment, and show you how to use the facilities provided in log4j to best effect in the context of a working example. I will also show you how to mine the log information once it has been gathered. Even with the best planning in the world, it can still be time consuming and tricky to find the information you're looking for in a large log file.

Why is multithreading a big deal?

If your program runs on several threads simultaneously, the logging information from each can become tangled up. Anyone who has tried reading output from 15 interleaved threads knows what I mean! In addition, your code might often be running in a distributed system on multiple machines. If you're using the same trace system for all your components, you'll be left to tie together multiple outputs from systems with potentially unsynchronized internal clocks. (This last issue is a real gotcha. Imagine reading through trace information, not realizing that the timestamps on your client machine are all 20 seconds earlier than the equivalent timestamps on the server!)

Listing 1 is an example of trace information coming from a single program being run using a simple output using timestamps and messages. It's much like the output you might see if you rolled your own logging infrastructure.

Listing 1. Trace information for a simple application
(12:01:02) Starting
(12:01:02) initializing variables
(12:01:03) making call to db
(12:01:04) got value 'A'
(12:01:05) setting new value to db
(12:01:05) done

That all seems straightforward enough. But now look at Listing 2, which shows what happens when you scale up and run five threads at the same time.

Listing 2. Trace information from a multithreaded application
(12:01:02) Starting
(12:01:02) Starting
(12:01:02) Starting
(12:01:02) Starting
(12:01:02) initializing variables
(12:01:02) initializing variables
(12:01:03) making call to db
(12:01:03) making call to db
(12:01:03) initializing variables
(12:01:03) initializing variables
(12:01:03) making call to db
(12:01:03) making call to db
(12:01:03) Starting
(12:01:04) got value 'A'
(12:01:04) initializing variables
(12:01:04) got value 'B'
(12:01:04) got value 'X'
(12:01:04) got value '3'
(12:01:04) making call to db
(12:01:05) setting new value to db
(12:01:05) setting new value to db
(12:01:05) got value 'A'
(12:01:05) setting new value to db
(12:01:05) setting new value to db
(12:01:05) done
(12:01:05) setting new value to db
(12:01:05) done
(12:01:05) done
(12:01:05) FAILED!!!
(12:01:05) done

So which of the five threads failed? This is a simple example and better tracing statements would help identify the source, but you can see how confusing just five threads of output can look when you have nothing but timestamps to guide you. All the threads get jumbled up, and timing issues mean that the output statements are not going to come out in relative thread order.

So how do you make life easier on yourself? The key is the use of unique identifiers. You need to tag each thread with a unique ID that is associated with its output. This will allow filtering of the final trace information based on that unique ID. log4j by default outputs messages that look like this:

153 [Thread-1] INFO demo.logger.sample.SampleCode  - Program Running

Note that it already includes a thread ID. But as you can see, layout isn't that nice by default; in addition, it's missing a timestamp, which I believe to be a vital element of any logging system.

Fortunately, log4j has a very nice feature that will help correct this: The user can determine new layouts for trace output by configuring a file. The file I'm going to use for my examples is illustrated in Listing 3.

Listing 3. Sample log4j configuration file
log4j.rootLogger=debug, stdout, R

log4j.appender.stdout=org.apache.log4j.ConsoleAppender
log4j.appender.stdout.layout=org.apache.log4j.PatternLayout

# Pattern to output the caller's file name and line number.
log4j.appender.stdout.layout.ConversionPattern=%d %5p [%t] (%F:%L) - %m%n

log4j.appender.R=org.apache.log4j.RollingFileAppender
log4j.appender.R.File=example.log

log4j.appender.R.MaxFileSize=1000KB
# Keep one backup file
log4j.appender.R.MaxBackupIndex=1

log4j.appender.R.layout=org.apache.log4j.PatternLayout
log4j.appender.R.layout.ConversionPattern=%d %5p [%t] (%F:%L) - %m%n

Some (or perhaps all) of that probably seems fairly cryptic at first glance. Let's look at it line by line to understand the syntax. The first line defines the loggers that are attached to the root logger. The two that are configured after that are stdout and R. You can see that stdout is associated with a ConsoleAppender, and then given a PatternLayout. The pattern is then defined with variables; you'll see later how those variables relate to parts of the output. Basically, the same setup is then done on R -- except that this is a file appender. It receives a file name, a maximum size, and a backup index. The backup index tells it how many old logs to retain, so when the maximum is reached the log is renamed and a fresh new one started. As you can see, the PatternLayout is then the same.

This configuration is stored in a file called log4j.properties, which I simply pass to my code as a parameter. In addition to defining my preferred layout, it also specifies the use of both a console log and a log file. The log file is set to only grow to 1000 KB; at that point, the old log becomes example.log.1 and a new log is created. You can control how many historical logs are kept with the MaxBackupIndex parameter. This is all very cool and easy to use; and, best of all, you can alter your configuration later to suit your needs without fiddling with your code.

If you run log4j using the configuration file in Listing 3, your result will be output that looks like this:

2004-01-24 15:00:23,714  INFO [Thread-1] (SampleCode.java:67) - Program Running

A note on random numbers

In my observations, funny things can happen if your Java code generates more than one random number in a very small period of time. Specifically, a series of "random" numbers generated within the smallest division of time the JVM deals with (one thousandth of a second) will actually be a series of identical numbers. But this is unlikely to be an issue in the situation I'm describing here, and may only apply to many random numbers generated within the same thread.

This gives you a datestamp, the logging level indicator, the thread ID, the line number you're on, and the log message. However, as you can probably guess, the thread numbers used are only unique within a single JVM. This is fine whilst you keep separate trace data for each JVM run; but if you want to view multiple instances that were run at the same time in separate JVMs, it would be helpful to add another unique ID to ensure that you don't muddle up threads. A simple approach to this is to just generate a random number when a thread is spawned and prepend that number to all output lines.

You may, however, decide that this is unnecessary and only adds more clutter to the trace output, given that you have to pass the random number as a parameter to the called programs. (If you don't want to pass in a parameter, you could add a method with which the client can query the thread ID of the called program. Then at least you can have something that relates the client thread to the called thread. It's not as easy, but it can work.) I personally think that it's helpful to run not just multiple threads but also complete separate instances of a client program and then be able to tie up all of the client trace with the activity at the back end, and possibly combine client information into one file. Assigning random numbers to execution threads will help you achieve this. Thus, in this example, I shall use this random number technique. (I'll show some cool things this allows for later.)

With this addition, a typical tracing output line looks like this:

2004-01-24 15:00:23,714  INFO [Thread-1] (SampleCode.java:67) - [-992285010] Program Running

[Thread-1] is the JVM thread ID; it's what you get when you call Thread.currentThread.getName(). [-992285010] is your random number. This (I hope) will be unique within the scope of the trace, and the combination of thread ID and random number should afford you uniqueness across multiple JVMs.


The example application

It's about time I introduced some example code to show you the basic calls that make what I've described happen. In the example scenario, a client application running multiple threads calls an EJB component, log1Bean, from each thread. In turn, this bean makes a call to a second EJB component, log2Bean, on a different server. In this way, you can have code on three machines running against multiple threads. The goal is to track exactly what is going on at each stage.

As you can see, we are ultimately heading towards a fairly complex environment in which to perform tracing. But let's take baby steps to get there. Listing 4 is the first step for the client code. It is a class that implements the Runnable interface, and launches some threads that currently do nothing more than log using log4j. (Note that I have used the Eclipse development environment for all my code examples. I've included some links in the Resources section for Eclipse and Eclipse plugins.)

Listing 4. A simple client class that uses log4j
/*
 * Created on 24-Jan-04
 *
 * To change the template for this generated file go to
 * Window>Preferences>Java>Code Generation>Code and Comments
 */
package demo.logger.sample;

import java.util.Random;
import java.util.Vector;

import org.apache.log4j.Logger;
import org.apache.log4j.PropertyConfigurator;

/**
 * @author wouldd
 *
 * To change the template for this generated type comment go to
 * Window>Preferences>Java>Code Generation>Code and Comments
 */
public class SampleCode implements Runnable{
   private static ThreadGroup myGroup = new ThreadGroup("group1");
   private static Logger logger = Logger.getLogger(SampleCode.class);
   //
   //invoked as:
   //java demo.logger.sample.SampleCode <threads> <logconfigfile>
   //
   public static void main(String[] args) 
   {
      PropertyConfigurator.configure(args[1]);
      logger.info("About to start threads");

      Vector threads = new Vector();
      int thrdsCount = Integer.parseInt(args[0]);
      for (int i = 0; i < thrdsCount; i++) {
         SampleCode sc1 = new SampleCode();
         threads.add(sc1);
      }   
      
   }
   public SampleCode()
   {
       threadStarter();
      return;
   }
   public void threadStarter() 
   {
      int sleepTime;
      try 
      {
         Thread aThread = new Thread(myGroup, this);
         aThread.setPriority(Thread.NORM_PRIORITY);
         java.util.Random rnd = new Random();
         //
         //start threads at random intervals up to 1000 ms
         //
         sleepTime = (int) (rnd.nextInt(1000));
         Thread.sleep(sleepTime);
         Thread.yield();
                
         aThread.start();
         return;
      
      } catch (java.lang.InterruptedException e) {
      }
   }
   public void run(){
      //
      //create random id for this thread
      //
      Random rnd = new Random();
      int rndid = rnd.nextInt();
      
      logger.info("[" +rndid +"]"+ " Program Running");
      logger.debug("[" +rndid +"]" + " Debug message!!");
      logger.warn("[" +rndid +"]" + " Warning this is a warning");
      logger.error("[" +rndid +"]" + " Error Message!!");
      logger.fatal("[" +rndid +"]" + "A Non-Fatal FATAL message!!");
      
   }
}

That's quite a lot of code that currently doesn't do much. It's mostly just infrastructure for the threading. The main() method in particular does very little. Notice the logger commands; currently, this is just an example of the different levels of logging that you can perform. I've used a fairly standard set:

  • info would indicate such things as method entry/exit, and updates on successful operation
  • debug would tend to be used to indicate such things as the value of variables at certain points
  • warn could be used to indicate situations where something retryable has failed; for instance, it could warn that a connection was not established the first time and that the system was trying again
  • error indicates where something has gone wrong
  • fatal indicates an errors that prevent further processing

However, keep in mind that it's up to you to decide how to break down your output into these categories. Make sure you think about it up front: If you've got multiple people working on code that will feed into the same logs, it's important that everyone knows the criteria. You don't want to lose the value of the system by having messages of a certain type being in mixed categories. I'd recommend writing down a guide on how you intend to classify specific things.

If you want to use these methods in your own applications, there's really nothing weird and wonderful to add to your code. Simply replace calls to System.out.println() (or whatever mechanism you're using to output trace data) with calls to relevant log4j method.

You should log as much as you can possibly get away with. When you really get into the debugging process, you'll invariably want the one piece of information that you thought you wouldn't. If performance is a concern, then think carefully about what you log and how you do it. Log4j provides some documentation on its performance impact, along with methods to make code perform better. Generally, it's a trade-off between performance when logging is on and performance when logging is off.

Useful things to log are:

  • Entry/exit to a method
  • Variable values at entry
  • Where you are calling out to (for remote calls)
  • The time before and after each remote call (the timestamps can show how long the call took)
  • The beginning and end of each thread

Run the sample code with a few threads with the following command line:

java demo.logger.sample.SampleCode 10

You'll see that you get nice neat blocks of output. Each thread's output is separate from the others, because at the moment there is nothing to split up the timing during execution. That will change when you get to the first EJB lookup.


Adding EJB components

Now you will create the EJB components that the client will call. They will do some very limited processing. The first bean will make a choice, based on a random number, to either call the second bean or not. Thus, some of the threads will make a full trip to the second bean and some will not. This will add some variety to the trace output to illustrate the power of the technique.

The first thing to do is to set up the logger from a system property. The method of setting this will change depending on the server you use to deploy the EJB component. I chose a setting called log4j.prop; this points to a file containing the setup. This is common to both EJB components that you're creating.

Listing 5 contains the first sample EJB component.

Note that these listings are not complete: I removed all the methods that are just empty (ejbRemove() and so forth). Obviously, you'll need those methods to get the code to execute, but there was no point listing them here. You can download the complete files from the Resources section below.

Listing 5. log1Bean
package demo.logger.sample;
/**
 * Bean implementation class for Enterprise Bean: log1
 */
import java.rmi.RemoteException;
import java.util.Properties;
import java.util.Random;

import javax.naming.InitialContext;
import javax.rmi.PortableRemoteObject;

import org.apache.log4j.Logger;
import org.apache.log4j.PropertyConfigurator;

public class log1Bean implements javax.ejb.SessionBean {
   private static Logger logger = Logger.getLogger(log1Bean.class);
   private static String ejbHomeLocation = "ejb/log2Home";
   private javax.ejb.SessionContext mySessionCtx;
   
   /**
    * ejbCreate
    */
   public void ejbCreate() throws javax.ejb.CreateException {
      //
      //pickup the location of the property file as a system property
      //
      Properties sp = System.getProperties();
      String propFile = sp.getProperty("log4j.props");
      PropertyConfigurator.configure(propFile);
      logger.info("Logger started for log1Bean, bean created");
      String ejbHomeLocation = sp.getProperty("log2.jndi");
   }
   
   
   public boolean doTest(int id)
   {
   //this method takes the id of the thread that called it to continue use in its own logging
   //it will roll a random number, and choose to lookup another EJB or not
       boolean result = true;
       logger.info("[" +id +"]"+ " Started doTest method in log1Bean");
       
       Random rand = new Random();
       int choice = rand.nextInt(10);
       logger.debug("[" +id +"]"+ " Random choice is " + choice);
       if (choice <=5){
          //
          //link to ejb2
          //
      try{
         
         InitialContext ic = new InitialContext();
         Object or = ic.lookup(ejbHomeLocation);
         if (or != null) {
         // Narrow the return object to the Home class type
         log2Home home = 
              (log2Home)PortableRemoteObject.narrow(or, 
                 log2Home.class);
         // Create an EJB object instance using the home interface.
         try{
            log2 lb2 = home.create();
            // Invoke the method
                                 if (lb2.doTest(id) == true){
               logger.info("[" + id +"]"+ " Execution ran successfully");         }
         }catch(RemoteException re){
            logger.fatal("[" +id + "]" + " Failure to create remote EJB " + re);            
         }
         }
      }catch(Exception e){
         logger.fatal("[" +id +"]"+ " Failure when looking up log2Bean " + e);
         e.printStackTrace();
                  }
       }else{
         //
          //state that you've gone nowhere and return
          //
          logger.info("[" +id +"]"+ " Path ends at Log1Bean");
       }
      
       
      return result;
   }
}

Listing 6 contains the second bean. It is much the same as the first; it just returns the fact that it was the end point for a given execution.

Listing 6. log2Bean
package demo.logger.sample;

import java.util.Properties;

import org.apache.log4j.Logger;
import org.apache.log4j.PropertyConfigurator;

/**
 * Bean implementation class for Enterprise Bean: log2
 */
public class log2Bean implements javax.ejb.SessionBean {
   private static Logger logger = Logger.getLogger(log1Bean.class);
   private javax.ejb.SessionContext mySessionCtx;
   /**
    * ejbCreate
    */
   public void ejbCreate() throws javax.ejb.CreateException {
      //
      //pickup the location of the property file as a system property
      //
      Properties sp = System.getProperties();
      String propFile = sp.getProperty("log4jejb2.props");
      PropertyConfigurator.configure(propFile);
      logger.info("Logger started for log2Bean, bean created");
      
   }
      
   public boolean doTest(int id)
   {
   //this method takes the id of the thread that called it to continue use in its own logging
   //it will roll a random number, and choose to lookup another EJB or not
      boolean result = true;
      logger.info("[" +id +"]"+ " Started doTest method in log2Bean");
       logger.info("[" +id +"]"+ " Path ends in Log2Bean");
      
       
      return result;
   }
}

The first bean, log1Bean, makes a more or less random choice as to whether it just returns or connects to a second bean. It obtains the log4j property file and the location of the second bean through the system properties, which you can set as appropriate for your system.

I originally set this up with both beans on the same server, just to test. I moved to a multi-server setup later on. Obviously, to look up on a second server, you need to supply the right system properties to make the lookup work. With everything on the a same machine, all three executing programs can use the same log file, and you'd get a nice combined log. But from here on out, I'll assume that you're running things on separate machines and thus will end up with three separate log files.


Running the scenario

If you're playing along at home, now is the time to install the EJB components and set the system properties to pick up configuration files and component locations. How you'll do this will vary, depending on your application server of choice. You should be able to take the property file provided earlier and make copies for each EJB component. (This basically involves having the exact same file for each EJB; in my case, where I ran two sample servers on one machine, I had to modify the property file to point to two separate but identical files.) Once all the components are up and running, you just need to update the client to actually do the lookup and invoke the first EJB component. The code needed to do this is fairly standard; you can download the full client code from the Resources section. (If you have difficulties getting your components to find each other, you might find it helpful to check out my earlier article, "Navigate the JNDI maze"; see Resources for a link).

Let's run the client with 50 threads by using the following command line:

Java demo.logger.sample.SampleCode 50 /home/wouldd/JavaWork/workspace/LoggerDemo/log4j.properties
     thisNode/servers/server1/ejb/log1Home

You don't need all those threads, but this way you'll see the power of the logging system you've created.

If you are playing along at home, you should see the log files generated at the locations specified in the properties file for each part of the system; if no directory is specified, it will appear in the current folder for each process. Alternatively, you can examine the output in the files available in Resources. Look in these files and you'll see lines like this one:

2004-01-24 15:00:23,714  INFO [Thread-1] (SampleCode.java:67) - [-992285010] Program Running

Here you can see differing thread identifiers and so on. From the line numbers (next to the name of the source file the logging is coming from) you can see how things progress through the client and both EJB components.

So now let's really dig into these logs and see the sorts of cool stuff that log4j can tell you.


What secrets do the log files contain?

Note: If you're not using Linux or some other UNIX, the console commands you'll use in this section will not be easily available to you. But you can use Cygwin to provide a UNIX command console under Microsoft Windows (see Resources for a link).

To view the full contents of all your log files, you can use the following command:

cat *.log

This assumes, of course, that you are in the directory with the files and that they end in .log. (This is the convention I have used, but you could have called your log file anything when you specified it in the properties file.) Admittedly, that's not too impressive. But try entering this command:

cat *.log | sort

You'll get output that looks something like Listing 7.

Listing 7. Sorted logging data
2004-01-25 18:18:28,000  INFO [Thread-4] 
  (SampleCode.java:109) - [-1278270939] Calling Create on log1Home
2004-01-25 18:18:28,316  INFO [Thread-4] 
  (SampleCode.java:114) - [-1278270939] Execution ran successfully
2004-01-25 18:18:28,585  INFO [Thread-5] 
  (SampleCode.java:92) - [-1756513822] Program Running
2004-01-25 18:18:28,586  INFO [Thread-5] 
  (SampleCode.java:98) - [-1756513822] Obtaining Initial context

Suddenly, all those timestamps put everything in order -- or at least they should. This is when you find out if the machines you used to run the tests all agree on what time it is, and if you're lucky, you'll get another insight into the complexities of putting together trace data from multiple platforms. If it becomes clear that your systems' times are not in synch, then I suggest you look into the various means available to you for synchronizing them (see Resources for a link). Trying to work out what is going on in trace is hard enough without things appearing to occur at the wrong time.

So now you have an enormous listing (in chronological order) of all manner of things. You can scan through it and see lots of information, but it's hardly what you'd call easy to read. For instance, what exactly happened to thread 24? Do you want to scan through the file to work it out? Let's try an easier way:

cat *.log | grep Thread-24

This shows you the output from the 24th thread of the client. And now you may realize what a great idea adding that unique thread ID was. Sure, it was a pain to have to pass it on a parameter to the EJB component, but now you can pick it out (<id> represents the random unique ID number):

cat *.log | sort | grep <id>

As if by magic, you get the ordered output of the full execution path triggered by thread 24, which should look something like Listing 8. You can see whether it stopped at log1Bean or continued on, and you won't have to sort through the clutter of the other trace points.

Listing 8. Full execution path triggered by thread 24
example.log:2004-01-25 18:18:50,445  INFO [Thread-24] 
  (SampleCode.java:92) - [-1242473967] Program Running
example.log:2004-01-25 18:18:50,446  INFO [Thread-24] 
  (SampleCode.java:98) - [-1242473967] Obtaining Initial context
example.log:2004-01-25 18:18:50,521  INFO [Thread-24] 
  (SampleCode.java:103) - [-1242473967] Narrowing home interface for log1Bean
example.log:2004-01-25 18:18:50,522  INFO [Thread-24] 
  (SampleCode.java:109) - [-1242473967] Calling Create on log1Home
example.log:2004-01-25 18:18:50,535  INFO [Thread-24] 
  (SampleCode.java:114) - [-1242473967] Execution ran sucesfully
examplelog2bean.log:2004-01-25 18:18:50,528  INFO [ORB.thread.pool : 0] 
  (log1Bean.java:65) - [-1242473967] Started doTest method in log1Bean
examplelog2bean.log:2004-01-25 18:18:50,529 DEBUG [ORB.thread.pool : 0] 
  (log1Bean.java:69) - [-1242473967] Random choice is 5
examplelog2bean.log:2004-01-25 18:18:50,530  INFO [ORB.thread.pool : 0] 
  (log2Bean.java:60) - [-1242473967] Started doTest method in log2Bean
examplelog2bean.log:2004-01-25 18:18:50,531  INFO [ORB.thread.pool : 0] 
  (log2Bean.java:61) - [-1242473967] Path ends in Log2Bean
examplelog2bean.log:2004-01-25 18:18:50,531  INFO [ORB.thread.pool : 0] 
  (log1Bean.java:91) - [-1242473967] Execution ran sucesfully

Note that the first bit of information in each line indicates the log file that the line is from.

Let's see some more clever things you can do with your trace information. Try this command:

cut -d '[' -f3 example.log | cut -d ']' -f1 | uniq

This will extract all the unique IDs that were used during the run. It's not so interesting on its own, but you can use it in conjunction with other commands, with interesting results:

cut -d '[' -f3 example.log | cut -d ']' -f1 | uniq | awk 
  '{system("grep -e " $1 " *.log > log"$1 ".txt")}'

This beast of a line basically says, "Take all the unique IDs from my client log file, then take each ID in turn and pull the set of logs associated with it, and store each in a file of the format log<id>.txt." This leaves you with 50 files, each containing the trace for just one thread. So from the example above of output for thread 24, this would be in a file called log-1242473967.txt, where the number in the filename is the unique ID.

Other logging options

It's worth reading the log4j documentation to review the various possibilities it provides. One of your logging options is the Log Factor 5 console. This is a GUI that shows a tree hierarchy of source (based on package names); the log associated with the selected tree components is shown in the main window.

You also have the option of logging over the network. By setting up a log server on a port of your choosing, you can have multiple machines log to that connection instead of (or in addition to) a local log file. This can be helpful to collect all trace together in one place. My preference is to keep separate files for each system; they can be queried and combined easily enough. However, you may find a number of options that suit your needs better by investigating the full range of functionality provided.

This is very cool, but remember: The commands only give you this power because you have the right pieces of information in the first place. Without timestamps, you could not effectively sort; without thread IDs and unique numbers, you wouldn't be able to cut things up so neatly. The single most powerful aid you've given yourself is the ID that you pass along your execution path. You may not like adding parameters for logging, but it sure does open a lot of powerful options.

And if you think these command lines look complicated, I should point out that shortly before I wrote them, I'd never tried to use them: I just knew the commands existed and was able to apply the principles for creating them in short order, which in itself is very useful. I apologise to any command-line gurus who can easily identify a much sleeker way to achieve the same results.

You can of course chose to log just certain levels of output, but my preference is to log everything, and use filtering techniques like these to pull the information I want at any given time. Thus, for a given run, if you change your mind about the level of detail you want, you just have to change your filter. Just use grep to pull out the lines with INFO or FATAL or what have you.


Taking system snapshots

So, what other things might you want to do with your log information? You can view it live on the log4j supplied console; you can combine several files, and separate threads into individual files. But what about seeing all the things happening at the same instant in a multi-platform environment? Well, timestamps are a wonderful thing. With a simple command, you can take a look at all output that occurred at a specific time:

cat *.log | grep 18:18:50 | sort

The output from this command would look something like Listing 8.

Listing 8. Trace data for a specific moment
2004-01-25 18:18:50,445  INFO [Thread-45] 
  (SampleCode.java:92) - [-1242473967] Program Running
2004-01-25 18:18:50,446  INFO [Thread-45] 
  (SampleCode.java:98) - [-1242473967] Obtaining Initial context
2004-01-25 18:18:50,521  INFO [Thread-45] 
  (SampleCode.java:103) - [-1242473967] Narrowing home interface for log1Bean
2004-01-25 18:18:50,522  INFO [Thread-45] 
  (SampleCode.java:109) - [-1242473967] Calling Create on log1Home
2004-01-25 18:18:50,528  INFO [ORB.thread.pool : 0] 
  (log1Bean.java:65) - [-1242473967] Started doTest method in log1Bean
2004-01-25 18:18:50,529 DEBUG [ORB.thread.pool : 0] 
  (log1Bean.java:69) - [-1242473967] Random choice is 5
2004-01-25 18:18:50,530  INFO [ORB.thread.pool : 0] 
  (log2Bean.java:60) - [-1242473967] Started doTest method in log2Bean
2004-01-25 18:18:50,531  INFO [ORB.thread.pool : 0] 
  (log1Bean.java:91) - [-1242473967] Execution ran sucesfully
2004-01-25 18:18:50,531  INFO [ORB.thread.pool : 0] 
  (log2Bean.java:61) - [-1242473967] Path ends in Log2Bean
2004-01-25 18:18:50,535  INFO [Thread-45] 
  (SampleCode.java:114) - [-1242473967] Execution ran sucesfully
2004-01-25 18:18:50,865  INFO [Thread-46] 
  (SampleCode.java:92) - [-1490252259] Program Running
2004-01-25 18:18:50,872  INFO [Thread-46] 
  (SampleCode.java:98) - [-1490252259] Obtaining Initial context
2004-01-25 18:18:50,874  INFO [Thread-46] 
  (SampleCode.java:103) - [-1490252259] Narrowing home interface for log1Bean
2004-01-25 18:18:50,875  INFO [Thread-46] 
  (SampleCode.java:109) - [-1490252259] Calling Create on log1Home
2004-01-25 18:18:50,880  INFO [ORB.thread.pool : 0] 
  (log1Bean.java:65) - [-1490252259] Started doTest method in log1Bean
2004-01-25 18:18:50,881 DEBUG [ORB.thread.pool : 0] 
  (log1Bean.java:69) - [-1490252259] Random choice is 9
2004-01-25 18:18:50,881  INFO [ORB.thread.pool : 0] 
  (log1Bean.java:108) - [-1490252259] Path ends at Log1Bean
2004-01-25 18:18:50,886  INFO [Thread-46] 
  (SampleCode.java:114) - [-1490252259] Execution ran sucesfully

You can see that you have a sorted list of everything that occurred within a specified second (18:18:50). You can define the time range as widely or as narrowly as you want, of course. This command will give you everything that happened across the full range of your system; if you so desired, you could have every single application logging, be they related or not, and do such searches on every log file, to get a snapshot of what was happening at a given instant across the board.

You can also reuse a command I gave you earlier to strip all the unique IDs that were being processed during this timeframe and generate separate thread logs for each. This allows you to effectively back and forward trace on every event that was in flight at a given time period. If you can't quite wrap your head around that, here's the command:

cat *.log | grep 18:18:50 | sort | cut -d '[' -f3 | cut -d ']' 
  -f1 | uniq | awk '{system("grep -e " $1 " *.log > log"$1 ".txt")}'

That surely is a beast of a line, but think of what it's given you: You can specify any snapshot, at any time for as much of the system as you chose to provide logs for, and then get a separate thread log showing what each of those processes did during its life. In effect, instead of getting the 50 files you got earlier, you only get files for threads that logged some output during the time period specified. That sounds to me like some pretty helpful information if I want to try to understand the complex interactions of events at the time of a failure.

If you think about it, of course, I'm only really capturing those things that were logging during my specified timeframe. What about those things that were running but didn't output logs? Here are a few more commands that could help you out. If I were feeling evil, I could probably just concatenate the whole lot into one uber-command; but instead, here are three separate command lines that basically look at all the unique numbers before a particular time period, and all those after, and check to see if there are any that were logging before and after, and hence executing during the time period under consideration.

 awk '{ if ($2 < "18:18:50") print }' *.log | cut 
  -d '[' -f3 | cut -d ']' -f1  | uniq > beforelog.txt

 awk '{ if ($2 > "18:18:51") print }' *.log | cut 
  -d '[' -f3 | cut -d ']' -f1  | uniq > afterlog.txt

cat beforelog.txt | awk '{system("grep " $1 " afterlog.txt")}' 
  | uniq | awk '{system("grep -e " $1 " *.log > log"$1 ".txt")}'

So here again each file contains the thread history as shown in the example earlier using thread 24. But you now see files for all those threads that were executing during the timeframe that you're interested in (even if those threads didn't happen to log during that timeframe).


Conclusion

I haven't even touched the amazing ways you can configure log4j; there are a number of articles that look at how to inherit log settings, and tweak things so that different logging levels and behaviors exist for different package classes. But what I've covered will help you get going using it; you should also now have a grasp good logging practices that will help collect lots of good information specifically for complex environments. And this applies to any program that is logging. Whether it uses log4j or not, if you stick to the same information and layout, you can still do all the data mining with UNIX commands that I've described.

To summarize some key points:

  • Plan logging form the start. It is much harder to just add it later, and you'll need it whilst you're debugging your code initially anyway.
  • Pick a mechanism that suits your needs, but don't make things hard on yourself. If someone else has written a free, open source, maintained, well performing, feature-rich option ... then use it.
  • Decide how you'll use the logging levels up front, and make sure everyone on your programming team knows the conventions that you want them to use.
  • If you're starting from scratch, add end-to-end processing IDs. They really are worth the effort of passing them through your programs.
  • Learn the power of UNIX text processing commands. I used them at the command line here but you can get all this done in scripts as well.
  • Log as much as you can possibly get away with, but don't forget about performance. Pay particular attention to logging around remote calls, and where threads branch, start, and end. The more information in the log, the more powerful the data mining you can achieve.

Resources

Comments

developerWorks: Sign in

Required fields are indicated with an asterisk (*).


Need an IBM ID?
Forgot your IBM ID?


Forgot your password?
Change your password

By clicking Submit, you agree to the developerWorks terms of use.

 


The first time you sign into developerWorks, a profile is created for you. Information in your profile (your name, country/region, and company name) is displayed to the public and will accompany any content you post, unless you opt to hide your company name. You may update your IBM account at any time.

All information submitted is secure.

Choose your display name



The first time you sign in to developerWorks, a profile is created for you, so you need to choose a display name. Your display name accompanies the content you post on developerWorks.

Please choose a display name between 3-31 characters. Your display name must be unique in the developerWorks community and should not be your email address for privacy reasons.

Required fields are indicated with an asterisk (*).

(Must be between 3 – 31 characters.)

By clicking Submit, you agree to the developerWorks terms of use.

 


All information submitted is secure.

Dig deeper into Java technology on developerWorks


static.content.url=http://www.ibm.com/developerworks/js/artrating/
SITE_ID=1
Zone=Java technology
ArticleID=10926
ArticleTitle=Tracing in a multithreaded, multiplatform environment
publish-date=03182004