Creating an adapter
The GLA uses an XML configuration file to control how it parses and transforms log files, and how it emits data. A configuration file contains one or more contexts in which each context defines how to transform one log file. In some cases, contexts within a configuration file can run simultaneously.
Begin by creating an adapter configuration file to process the Linux log file named daemon.log. On your test system, running Debian Linux, daemon.log captures messages from the POP3 (e-mail), THTTPD (the "trivial" HTTP server -- a small, fast Web server that only serves static files), and MyDNS (a small, easy-to-configure Domain Name System (DNS) server) daemons. Daemon.log also records when the MySQL daemon starts and stops.
Listing 12 shows a snippet of the file with log entries created by the POP3 and THTTPD servers.
Listing 12. A snippet of the Linux daemon.log file
Mar 2 07:24:54 db popa3d: Session from 126.96.36.199 Mar 2 07:24:55 db popa3d: \ Authentication passed for joan Mar 2 07:24:55 db popa3d: \ 1422 messages (11773432 bytes) loaded Mar 2 07:24:57 db popa3d: \ 0 (0) deleted, 1422 (11773432) left Mar 2 07:26:28 db thttpd: \ up 3600 seconds, stats for 3600 seconds: Mar 2 07:26:28 db thttpd: \ thttpd - 0 connections (0/sec), 0 max simultaneous Mar 2 07:26:28 db thttpd: \ map cache - 0 allocated, 0 active (0 bytes)... Mar 2 07:26:28 db thttpd: \ fdwatch - 1589 selects (0.441389/sec) Mar 2 07:26:28 db thttpd: \ timers - 3 allocated, 3 active, 0 free Mar 2 07:27:35 db popa3d: \ Session from 188.8.131.52 Mar 2 07:27:35 db popa3d: \ Authentication passed for martin Mar 2 07:27:35 db popa3d: \ 1350 messages (10880072 bytes) loaded Mar 2 07:27:36 db popa3d: \ 4 (11356) deleted, 1346 (10868716) left Mar 2 07:29:54 db popa3d: \ Session from 184.108.40.206
Each context in an adapter configuration file defines six components: the context instance, the sensor, the extractor, the parser, the formatter, and the outputter. The context instance sets parameters for the general operation of the transformation, including whether the log is appended to continuously and how frequently the log is amended. The remaining five components (conceptually) act in sequence, reading input, performing a task, and passing results on for further processing (except for certain outputters, which simply write results to a file or to the console):
- The sensor reads the log file in pieces until it reaches the end of the file and pauses. Then, when the sensor detects that the log file has grown, it reads the additional data. The sensor passes its data to the next stage, the extractor.
- The extractor reads data and divides it into individual records. One regular expression defines what the start of a record looks like, and another regular expression defines the end of record. Individual records, when identified, are passed on to the parser for additional processing.
- The parser reads one record at a time from the extractor and decomposes each into fields and values. Furthermore, the parser can make decisions based on the content of a record and apply one or more sets of rules to yield fields and values. For instance, if a log file indicates the start, interim, and end of an event, the parser can decompose each record into a set of fields and values unique to that event. Ultimately, the parser's objective is to map fields and values in each log file entry to the proper elements, attributes, and values in a CBE XML record. The formatter reads the output of the parser.
- The formatter's job is simple: It reads the elements, attributes, and values the parser creates, and it creates an object suitable for consumption by the last stage in the context, the outputter.
- And the outputter consumes objects from the formatter and emits the object. Outputters can emit XML to a file or to the console. They can also create a new log file or pass the data to a daemon.
The next five sections describe how to define each of the six components of a context.
To begin, create a simple Eclipse project to contain the adapter configuration file:
- Click File > New and expand Simple. Choose Project and click Next.
- Name the project My Adapter and click Finish.
- Click File > New > Other and expand Generic Log Adapter. Choose Generic Log Adapter File and click Next.
- Choose My Adapter and name the adapter file my.adapter. Click Next.
- Choose a template for the log file you want to process with this adapter (see Figure 2).
You can use a snippet of the actual log file you want to process or an accurate representation of the log file -- say, from a detailed specification. Click Browse, navigate to the file system, and open the template. After making your selection, click Finish. Click Yes when prompted to switch perspectives.
Figure 2. Choose a template that represents the log file to transform
Figure 3 shows the Generic Log Adapter perspective. As you can see, the UI displays a context instance in which the sensor's properties point to the template log file you just chose. The context instance also includes an extractor, a parser, a formatter, and an outputter, which you must define further.
Figure 3. The Generic Log Adapter perspective
Each context instance describes how to process one log file. You can set several options in a context instance. To see the options, click Context Instance below Configuration. You should see a panel that resembles Figure 4.
Figure 4. Context instance options
You can edit the Description to capture the intent of this particular context. In addition:
- If your log file is continuously updated, as is the case with daemon.log, select the Continuous operation check box.
- Maximum idle time is the number of milliseconds a context should wait for the log file to change before the context instance is shut down.
- Pause interval controls how long the context should wait after it reaches the end of a log file.
- Because log files aren't only ASCII text, you can set the ISO language code (using two lowercase letters), the ISO country code (using two uppercase letters), and the file's Encoding (using a value from the Internet Assigned Numbers Authority (IANA) character set registry). By default, these parameters are set to
US, and the default encoding of the JVM.
- Finally, because some log files do not denote time zone, year, month, and day and because CBEs require all four values, you can provide substitute values in the Timezone GMT offset and Log file creation date fields.
Because daemon.log grows continuously, select the Continuous operation check box. Because mail is typically polled often, set Maximum idle time and Pause interval to
120. The test machine is located in Colorado, so the GMT is
-7. Daemon.log doesn't specify the year, so a default of
2006 is provided as a substitute. After making these changes, save the file.