Skip to main content

Access unusable log data with SQL

Move verbose logs to an RDBMS and reveal a wealth of information

Sam Mefford (sam@mefford.org), Senior J2EE Developer, iCentris, Inc.
Sam Mefford is a Senior J2EE Developer/Architect for iCentris, Inc. He is on a constant search for the most "elegant" (easy, yet powerful) tools for his development teams.

Summary:  Application output logs create a very real dilemma. The logs can have too much output and be unmanageable, or too little output and be useless for tracing real problems. Developer Sam Mefford demonstrates that logging to an RDBMS can eliminate this problem, turning unusable output into informative reports. This article illustrates that by using SQL as a mining tool, data worth is greatly enhanced by revealing details of application usage, highlighting code fragments and queries that need optimization, and pinpointing the conditions that create hard-to-reproduce bugs.

Date:  19 Aug 2003
Level:  Intermediate

Activity:  459 views
Comments:  

Have you ever looked at your application logs and thought, "Who could find anything with so much output?" Or, "If only I had logged a little more information, maybe I could duplicate this production-only error!"

Deciding between verbose or minimal logging is a no-win choice because both approaches have real drawbacks; using either, it is difficult to solve the one problem without creating the other. This is why some teams have found a refreshing new method to turn unusable log data into gold -- move logs out of text files and into a relational database used specifically for logging.

Problems with the old approaches

"Is there really such a problem with System.out.println()?," you might ask. And, "What about my enhanced logging tool? It allows me to break logs out by component and also set a log level so I have verbose logs during development and minimal logs on production. Doesn't that solve the problem?"

Unfortunately, the same drawbacks exist for all text-logging solutions:

  • If logging is too verbose, it's hard to find the few lines that apply to the current problem.
    While breaking logs out by component can help, sometimes it can also hinder since now you may have to search multiple files to find a single answer.
  • Writing to logs can slow the application.
    This is why many teams disable or minimize logging on production; this can lead to the problem discussed in the next bullet point.
  • If logging is too minimal, you may not have enough information to isolate the current problem.
    If the problem is hard to reproduce and only happens on production, you may wish your logs were more verbose.
  • Verbose logs can fill hard drives, potentially crashing your application!
    Only the best sysadmins watch their hard drive space closely enough to avoid this problem.

The new approach: Changing the rules

Although application logs can contain mountains of important data, often they are not used as a valuable source of information for the reasons stated previously (performance, retrieval, storage issues).

For those daring enough to ditch the text logs and start logging in an RDBMS (relational database management system), the rules change about what should be logged.

More power

Imagine utilizing the simplicity and power of SQL queries (structured query language) to pull valuable information from your application's logs. You could be producing such gems of data as the answers to:

  • Which database queries take the longest to execute?
  • Which screens take the longest to produce?

Additionally, you could give yourself the ability to:

  • Keep enhanced access logs with all the standard information, plus some more information specific to your application.
  • Start tracking just how often that third-party Web service is not responding as you expect. (The more information you keep on it, the easier it is to prove the problems are on that end.)
  • Start logging enough context information so you can determine when and why that hard-to-reproduce bug keeps biting your users.

Keeping logs separate

Some applications already track similar information in the application database. While this is a good first step, it is more effective to create a separate, logging-only database for the following reasons:

  • Application data and logging data have two very different audiences.
    • From a database security standpoint, no outside access is always more secure than granting partial access.
    • Allowing arbitrary reports against production data is an abuse of valuable production resources.
    • The nature of logging data is fluid (many write/delete cycles) and thus should be on non-critical hardware where raw speed loss from hard drive fragmentation is not a big deal.
  • When logging to a non-critical system, the pressure is off to only log important information. Once developers feel free to log anything they think might be useful, more gold mines like those listed earlier will become obvious.

General rules for an RDB logging system

To design a RDBMS logging system, remember to keep it simple. The simpler the logging facility is, the more likely team members are to use it.

I have created a simple RDB logging system that I call Logmine (see Resources; the code samples that follow are taken from Logmine). Listing 1 is an example of how simple, yet powerful, logging to an RDBMS can be for developers:


Listing 1. Sample of logging query execution time
Logmine.log("DB_TIMER", MapUtil.toMap("sql",           sql,
      "millis",        String.valueOf(startTime - endTime), 
      "rows_returned", String.valueOf(count)));

In this code, DB_TIMER is the name of the log. (Logmine automatically creates a DB_TIMER table if it doesn't already exist; see Resources.) MapUtil.toMap() is a utility method for making several arguments into a Map (name-value pairs). The data goes in columns named sql, millis, and rows_returned.

This logging call, rather than adding to an overwhelming pile of text, makes sure a database table exists, creates the necessary columns, and adds a row of data. After running this log for a while, you could reveal a wealth of information about which database queries in your application are taking the longest:


Listing 2. Newly accessed knowledge from one SQL report
SELECT MAX(millis), AVG(millis), MIN(millis), COUNT(millis), sql
FROM DB_TIMER
GROUP BY sql
ORDER BY MAX(millis) DESC

This SQL query produces the sample report in Table 1:


Table 1. The SQL query sample report
avgmaxmincountsql
81041454624286SELECT * FROM Blogs WHERE content LIKE '%?%'
386.75716148SELECT * from Person WHERE Last_Name LIKE '%?%'
76.25144194SELECT user_id from User_Login

Now try another example of a useful logging call. This is all it takes to generate an enhanced access log:


Listing 3. Creating an enhanced access log
Logmine.log("ENHANCED_ACCESS_LOG", 
  MapUtil.toMap("request",        request.getRequestURI(),
                "referer",        request.getHeader("Referer"),
                "browser",        request.getHeader("User-Agent"),
                "ip",             request.getRemoteAddr(),
                "user",           user.getUserName() ,
                "browser_locale", request.getLocale(),
                "user_locale",    user.getLocale(),
                "session_id",     session.getId()));

This data is logged in an ENHANCED_ACCESS_LOG table with the columns request, referer, browser, ip, user, browser_locale, user_locale, and session_id.

Unlike most calls to an RDBMS, you do not need to do a lot of careful advance planning to decide what data you choose to save since you can add new logs (such as DB_TIMER or ENHANCED_ACCESS_LOG) at any time. Missing tables are created for you, so setting up the logging database is no hassle. On a whim, you can modify the data points you wish to log (as often as you like) and the necessary columns are created for you.


Slicing data for reports

Once you have logged some usable data , the fun begins with the slicing and dicing -- generating those valuable reports! Look at just a few of the possible reports you can draw (most active user, most used browser, rank of pages by distinct visits) for the ENHANCED_ACCESS_LOG:

A most active user report contains the following (code first, then the output):

SELECT count(*), user
FROM ENHANCED_ACCESS_LOG
GROUP BY user
ORDER BY 1 DESC

count(*)user
432joe12
214mary53
137john55

A most used browser-by-version report contains the following code and output:

SELECT count(*), browser
FROM ENHANCED_ACCESS_LOG
GROUP BY browser
ORDER BY 1 DESC

count(*)browser
567Mozilla/5.0 (Windows; U; Windows NT 5.1; en-US; rv:1.0.2) Gecko/20030208 Netscape/7.02
452Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.1)
3Wget/1.8.2
2Lynx/2.8.4rel.1 libwww-FM/2.14 SSL-MM/1.4.1 OpenSSL/0.9.6h

And pages ranked by number of distinct session visits contains the following code and output:

SELECT count(*), request 
FROM ENHANCED_ACCESS_LOG
GROUP BY request, session_id
ORDER BY 1 DESC

count(*)request
724http://www.mysite.com/someUrl?someParam=someValue
659http://www.mysite.com/anotherUrl
647http://www.mysite.com/thirdUrl?someParam=someValue

While the possibilities are almost unending, I don't want this article to be, so I'll leave the rest to your imagination. It is important to remember that you should be tracking many critical elements in your application. Using the new paradigm of simple logging and simple-yet-powerful reporting, you and your team can begin to see logging in a different light.


The power without the hassle

Knowing more about how clients are using an application can differentiate the successful company from the many failures. Indeed, entire companies have been built to service this need by creating software to analyze and report on access logs (such as WebTrends). Unfortunately, most packaged software is limited in its ability to report the critical information specific to each custom application.

With a simple tool such as Logmine, combined with the power of SQL, you can leverage the familiarity your team has with your application and create reports that the packaged solutions could never have provided. Logmine is intended to be non-mission critical, so it doesn't need resources as powerful as your production database. You could even use a simple-to-setup and -maintain database like MySQL (information on both is found in Resources). Logmine is designed to not slow your production code since it does its work in a separate thread. As long as you have a good network connection to the logging database, the logging is low cost.

The next time a fellow from marketing asks, "Which module of our application do our users use the most?" or your database admin inquires, "Which queries do you need me to optimize?," you need only take a minute and throw in a couple Logmine calls, then forget about it: Eventually, you'll have these valuable answers at your fingertips.

This method also provides you with valuable application-design information. With a few more Logmine calls, you can determine which browser/OS combination creates that hard-to-duplicate bug. Rather than bother the sysadmins to deliver access to the production database or to send you the production logs, use Logmine as your no-restrictions reporting tool.


Resources

  • Discover another gold mine of information on how to gather the data your system records and what to do with that data in "Assess your e-business infrastructure for Web readiness," by Russ Scadden (developerWorks, April 2003).

  • Read "Measuring Web traffic, Part 2" (developerWorks, March 2001) by Andrei Malicinski, Scott Dominick, and Tom Hartrick delivers methods to monitor traffic and gather usage data through HTTP server log analysis. Part 1 is also available (developerWorks, March 2001).

  • On the MySQL site, find up-to-date information on software development and related support and services for the MySQL database.

  • Try Logmine, a simple RDB logging system that consists of one JSP to receive HTTP logging calls (it is deployed on a separate app server and writes to the logging database) and one class to generate them (the class is used by your application to log calls).

  • Explore the WebTrends site for the WebTrends Log Analyzer tools. This Web site analysis product is specifically designed for small businesses.

  • Check out Uppsala Universitet for a wide range of free and commercial access log analyzers that are tied to specific platforms or are platform-independent.

  • Get the WebSphere Site Analyzer, an enterprise-level Web analytics tool that transforms random data into intelligence by capturing, analyzing, storing, and reporting on Web site usage, health, integrity, and site content.

About the author

Sam Mefford is a Senior J2EE Developer/Architect for iCentris, Inc. He is on a constant search for the most "elegant" (easy, yet powerful) tools for his development teams.

Comments



Trademarks  |  My developerWorks terms and conditions

Help: Update or add to My dW interests

What's this?

This little timesaver lets you update your My developerWorks profile with just one click! The general subject of this content (AIX and UNIX, Information Management, Lotus, Rational, Tivoli, WebSphere, Java, Linux, Open source, SOA and Web services, Web development, or XML) will be added to the interests section of your profile, if it's not there already. You only need to be logged in to My developerWorks.

And what's the point of adding your interests to your profile? That's how you find other users with the same interests as yours, and see what they're reading and contributing to the community. Your interests also help us recommend relevant developerWorks content to you.

View your My developerWorks profile

Return from help

Help: Remove from My dW interests

What's this?

Removing this interest does not alter your profile, but rather removes this piece of content from a list of all content for which you've indicated interest. In a future enhancement to My developerWorks, you'll be able to see a record of that content.

View your My developerWorks profile

Return from help

static.content.url=http://www.ibm.com/developerworks/js/artrating/
SITE_ID=1
Zone=Web development
ArticleID=11824
ArticleTitle=Access unusable log data with SQL
publish-date=08192003
author1-email=sam@mefford.org
author1-email-cc=

My developerWorks community

Tags

Help
Use the search field to find all types of content in My developerWorks with that tag.

Use the slider bar to see more or fewer tags.

Popular tags shows the top tags for this particular content zone (for example, Java technology, Linux, WebSphere).

My tags shows your tags for this particular content zone (for example, Java technology, Linux, WebSphere).

Use the search field to find all types of content in My developerWorks with that tag. Popular tags shows the top tags for this particular content zone (for example, Java technology, Linux, WebSphere). My tags shows your tags for this particular content zone (for example, Java technology, Linux, WebSphere).

Special offers