With legislative changes, like the introduction of the Sarbanes-Oxley-Act, and its need for detailed logging of activities, as well as recent economic changes, like service orientation and on-demand business, keeping track of who is doing what within enterprises and therefore application logging is becoming more important.
Logging is no longer a feature used just for debugging when something goes wrong inside an application, but is a permanent process to make all transactions traceable and accountable. In business-critical applications, like customer databases or ATM terminals, logging is a vital requirement to keep track of all events. Therefore, logs have to be stored reliably and made searchable.
XML is at the core of SOA and Web services. Moreover, it is flexible and thus ideal for log messages where information, new log types, and applications may be added over time.
Customers are usually distinguishing between at least two different types of logs: technical logs that capture environment information (what machine, which OS, and so on), and functional logs, which capture what is done. Both log types can also be mixed to a single structure.
Logs of both types contain a lot of information; some parts are business-critical, while other parts are just informal. Usually, many (rather small) log files are produced — one for every operation or step an application performs. Therefore, tens of millions or even more log files per month can accumulate in a single enterprise. Despite the amount, all files need to be processed efficiently, accurately, and without loss. In addition, client applications must not be impacted by log file processing.
Assuming one log file is between 1KB and 20KB in size, and you have to deal with up to 10 million log files per day, you'll need 35GB of storage space for uncompressed data for just one day and 3TB of storage space for uncompressed data for a whole month, approximately. Since the clients generating the log files are running on lightweight and specialized hardware, they do not provide storage space for the load of log files they produce.
Therefore, you need centralized storage with large capacities, where you can store and analyze the log files. Databases have proven to be the best available storage systems for this type of task. Database management systems with the ability to natively store and query XML documents will facilitate application logging. Listing 1 shows an example XML file:
Listing 1. Sample XML file
<?xml version="1.0" encoding="US-ASCII"?> <File> <Record> <Header version="1"> <Time>2002-11-15 18:19:17.6</Time> <Type>INFORMATION</Type> <Id>-471559096676384768</Id> </Header> <Application> <Name>SecurityWebService</Name> <Function>GetValue</Function> <User>JDoe</User> <Result>3171861797959368704</Result> <Params> <Param> <Type>Object</Type> <Value>Object</Value> </Param> <Param> <Type>Object</Type> <Value>security.ssl</Value> </Param> <Param> <Type>Object</Type> <Value>0</Value> </Param> </Params> <CallTime>2004-11-15 16:19:17.7</CallTime> <StartTime>2006-10-18 12:18:14.7</StartTime> <EndTime>2000-11-16 18:14:16.4</EndTime> <ReturnTime>2004-11-12 10:10:12.7</ReturnTime> </Application> <System> <Name>INTRANET01</Name> <State>498308015556919296</State> </System> [..] </Record> </File>
A centralized repository for the logs (a database system, for example) is used to integrate the activities from various applications. The data can be analyzed and the "big picture" across all applications can be created.
Having the applications insert their logs directly into the central repository is not feasible for many reasons. To reliably move the log information from the application to the database, a message queue is used. To further decouple the application from the message queue, a small in-memory database system can be used. It also supports buffering messages during peak loads.
Since losing log files in case of failure is not tolerable, all systems involved in log shipping must be transactional.