Level: Intermediate Grace Walker (gwalker@walkerautomated.com), IT Consultant, Walker Automated Services
07 Mar 2006 Provide solid information resources to decision makers. Discover a simple, but useful, application of the combined processing capabilities of awk and XML that you can use to present UNIX® system data in a reader-friendly form suitable for posting to the company intranet or Internet. UNIX generates useful system performance, usage, cost, and related data that management and other interested stakeholders can use.
External requirements have radically changed the business landscape. The contemporary legal and
regulatory environment makes the effective use of crucial information a business imperative.
Requirements, such as those imposed by the Sarbanes-Oxley (SOX) Act, mandate the efficient use and
management of information. Today's organizations must use their log management process to
accommodate financial recordkeeping requirements of the federal government, as well as their
real-time incident monitoring and response needs. Given such requirements, log management has
assumed significance beyond its traditional use as a system monitoring and solution tool.
System log files are one of the most important and most frequently overlooked system monitoring
tools. In UNIX®, every program that creates log files either writes directly to its own
log file or captures information through the syslogd service. Although timely examination and analysis
of these system log files should be fundamental for all systems administrators, reality often differs
from the ideal. Due to the enormous amount of data to manage, many systems
administrators have neither the time nor the inclination to analyze system logs. In today's market, however, log management is a crucial task that you cannot afford to ignore.
Failure to use log file data can lead to problems not only with government agencies but can also negatively affect the general operational culture of the enterprise. Failure to manage these resources inevitably leads to ineffective system process administration, management, and control. Consequently, the methods used in the composition and distribution of information have also grown in importance.
In these circumstances, log management must progress beyond traditional system monitoring usage. Organizations need a simple method for managing this mountain of information. Management is always interested in deploying the most efficient and cost-effective method of handling any task. So in order to implement, the solution must meet this standard.
What's to be done?
Log files contain system data that you can use to determine current and future system problems.
However, there is one slight drawback. Even though UNIX log files are usually text files (the few that
aren't have utilities to convert them to text files), the data format isn't user friendly. Fortunately,
because these logs are text files, you can use various UNIX scripting and text-manipulation utilities
to examine and analyze log file content.
One of the best UNIX tools for simple extraction and formatting purposes is awk. Awk provides a
sophisticated set of processing capabilities. It is a succinct, simple language that produces easily read,
clean code. It facilitates quick development with little chance for pesky bugs to disrupt the development
timeline.
Awk provides the perfect means to translate the log data into a new form. But what should that new
form be? The most efficient and cost-effective means of information distribution is the corporate
intranet or the Internet. Obviously, a Web-based format is the logical choice.
Converting data to XML
Displaying log file data on a Web site requires you to transform the data. You can use awk to convert the data into XML format, and then use XML data islands to insert the data into HTML Web pages. This process presents the information in a user-friendly and familiar format.
The advantage of using HTML and XML together is that the formatting is separate from the data. HTML provides formatting and displays details, while XML handles the data structure. This division ensures that changes to the underlying data don't impose changes on the HTML code. After creating a HTML page, you can automatically plug the data; you can paste a new XML file over an old one, easily modifying the Web page so that it reflects the data contained in the new XML file.
Combining the powerful capabilities of awk and XML to assist in log monitoring responsibilities provides management with the optimal solution. The rest of this article uses a simple log file containing a timestamp and a description. It outlines the basic process to use in awk to create XML data -- the data to use with HTML.
Designing XML data with awk
Every XML file must begin with the declaration statement that identifies the file as XML. The XML declaration defines the version and the type of character encoding used in the document. This example uses awk to place the XML declaration information, XML tags, and log file data into the XML file. The awk utility executes each block of code once for each input line.
Awk has a beginning, middle, and an end. In the example created here (see Listing 1), you must do
a bit of work before awk starts processing the text from the input file. For such situations, awk allows
you to define a BEGIN block of code. Because the BEGIN
block is evaluated before awk starts processing the input file, you must create:
- The field separator
(
FS) variable
- The XML declaration statement
- The
<root> tag
In the code that follows, you define the log file
FS, and then you use the print statements
to place the first XML statements in the data file.
The quotation marks inside the print statement are preceded by a
backslash so that awk doesn't misinterpret the statement. Awk does not write backslashes into the XML file.
Listing 1. The BEGIN block
BEGIN {
FS="\t"
{ print "<?xml version=\"1.0\"
encoding=\"utf-8\"?> \n" }
{ print "<log> \n"
}
|
The next step is to create the log file processing statements. The log file processing example shown
here creates a file named log.xml. The file contains child and sub-child elements. The log.xml file has the
structure shown in Listing 2.
Listing 2. Log message elements
<log>
<logmessage>
<timestamp>
</timestamp>
<description>
</description>
</logmessage>
<logmessage>
<timestamp>
</timestamp>
<description>
</description>
</logmessage>
</log>
|
In the log file, the log message timestamp is the first field ($1) and the description is the second field ($2). The awk program creates the XML structure from the log file data using the code
in Listing 3.
Listing 3. Create the XML structure
{ print "\t<logmessage> \n"}
{ print "\t\t<timestamp>",
$1, "</timestamp> \n"}
{ print "\t\t<description>",
$2, "</description> \n"}
{ print "\t</logmessage> \n"}
|
The final step in the XML file creation process is the coding of the awk END
block (see Listing 4). The END block executes after all lines in
the input file have been processed. Use it to print the ending XML </root>
tag.
Listing 4. The END block
END {
{ print "</log>"}
}
|
Listing 5 shows the complete awk program.
Listing 5. Complete awk program
BEGIN {
FS="\t"
{ print "<log> \n"}
{ print "<?xml version=\"1.0\"
encoding=\"utf-8\"?> \n" )
}
{ print "\t<logmessage> \n"}
{ print "\t\t<timestamp>",
$1, "</timestamp> \n"}
{ print "\t\t<description>",
$2, "</description>\n"}
{ print "\t</logmessage> \n"}
END {
{ print "</log>"}
}
|
This example awk program creates a file that contains all the data from a log file. It provides the fundamental processing information that is required to move data from the log file to the XML file. To restrict the output and the amount of information presented on the Web page, add awk conditional and control flow statements to qualify the data collected and ultimately displayed. For example, the XML file might have a restriction to specific dates, or error logs might have a restriction to specific types of error messages.
Building Web pages with XML data islands
Now that you have created your XML file, you need to construct a data island structure to
place the data onto the HTML page. A data island is an XML Data Source Object (DSO)
that exists within an HTML page displayed in Microsoft® Internet Explorer. The XML DSO
is a Microsoft ActiveX® control built into Internet Explorer Version 4 and later. The XML DSO
provides a means to extract content from an external XML file into an HTML page.
The XML DSO object employs a process called data binding. In data binding, an
ActiveX control communicates directly with another Web page or with an external XML data
source. When creating the HTML document, the XML code components discussed here are
included.
In the HTML code, the <xml> element marks the beginning
of the data island. Its id attribute provides a way to reference
the data island. You use the src attribute to identify the external
XML file. The code in Listing 6 illustrates the structure of a data island.
Listing 6. The XML ID statement
<xml id="logID" src="log.xml"></xml>
|
The HTML page displays the XML data in a table. You must create an association
between the XML data and the HTML datasrc attribute of a table
tag by matching the value of the id attribute with the value of
the datasrc attribute, as shown in Listing 7 below. The <table>
tag uses the datasrc attribute to refer to the XML data island
whose id attribute is logID.
Listing 7. Associate XML data and HTML datasrc
<table id="logID" border="1"
datasrc="#log" cellpadding="5">
|
Next, you create of the table header information to identify the table
element (see Listing 8).
Listing 8. Table header information
<thead>
<tr>
<th width="50%">
<div align="center">Timestamp</div>
</th>
<th width="50%">
<div align="center"> Description </div>
</th>
</tr>
</thead>
|
Using HTML tags that can accept data source tags (that is, tags that bind the HTML to the XML
data), you can easily format and display the XML data (see Listing 9). The <td>
element itself can't be bound to data, but the <span> tag can.
You display the XML data by using the <span> tag with the
datafld attribute within the table columns. The datafld
attribute identifies the XML element to be placed in the cell of the table. The value of the
datafld attribute should match the name of the XML tags. Recall
that the XML tag names created using awk were <timestamp>
and <description>. As the XML file is read, additional table rows
are created for each element identified with the tag.
Listing 9. Binding HTML to XML data
<tbody>
<tr>
<td width="68"><span
datafld="timestamp"></span></td>
<td width="87"><span
datafld="description"></span></td>
</tr>
</tbody>
|
Listing 10 shows the complete HTML file.
Listing 10. Complete HTML file
<html>
<head>
. .
</head>
<body>
. .
<xml id="logID" src="log.xml"></xml>
<table id="logID" border="1"
datasrc="#log" cellpadding="5">
<thead>
<tr>
<th width="50%">
<div align="center">Timestamp</div>
</th>
<th width="50%">
<div align="center"> Description </div>
</th>
</tr>
</thead>
<tbody>
<tr>
<td width="68"><span
datafld="timestamp"></span></td>
<td width="87"><span
datafld="description"></span></td>
</tr>
</tbody>
</table>
. .
</body>
</html>
|
You now have a complete HTML page with extracted log data.
Awk in your enterprise
In this article, you saw the synergy that the innovative teaming of awk with XML created. You learned how to use XML to structure and integrate data into information appropriate for posting on enterprise-wide networks, as well as tapping into the power of the inherent in HTML-based communications. You also learned how to make the information readily accessible to a wide audience of interested parties.
Given the increasing premium placed on timely and accurate information, providing solid information resources to decision makers is a valuable asset to any enterprise, public or private. You can adapt the basic methods discussed here as they are, or use them as a basis for similar processing innovation with awk and XML/HTML. At a minimum, this discussion should serve to reinforce the appreciation for awk's exceptional usefulness as a general data-extraction and preparation tool and to validate the role of XML as a universal storage medium.
Resources Learn
Get products and technologies
-
Build your next development project with
IBM
trial software, available for download directly from developerWorks.
Discuss
About the author  | |  | Grace Walker, a partner in Walker Automated Services in Chicago, Illinois, is an IT consultant with a diverse background and broad experience. She has worked in IT as a manager, administrator, programmer, instructor, and a Web developer in various environments, including telecommunications, education, financial services, and software. You can reach her at gwalker@walkerautomated.com. |
Rate this page
|