Skip to main content

skip to main content

developerWorks  >  Autonomic computing | Tivoli | Web development | SOA and Web services | XML | Rational | Information Management | WebSphere  >

Symptomatic event visualizer, Part 1: Challenges in data collection

How can a common event format and a symptom repository help address the complexity of business IT?

developerWorks
Document options

Document options requiring JavaScript are not displayed

Discuss


Rate this page

Help us improve this content


Level: Introductory

Abdi Salahshour (abdis@us.ibm.com), Problem Determination Architect, IBM 
Kalpana Doraisamy (kdoraisa@in.ibm.com), Staff Software Engineer, IBM
Ajay G Rengasayee, Software Engineer, Freelance writer

19 Jun 2007

This four-part series is a comprehensive usage guide that gives you an overview of the Log and Trace Analyzer for Java ™ Desktop, instructs you in the installation process and teaches you to configure the tool correctly. The series includes performance-enhancing tips, integration and hands-on scenarios, as well as data on the IBM Tivoli Monitoring 6.1 Events Tool. Discover how your data can be more consumable from start to finish and learn how to reduce your problem determination and maintenance costs. In part one, identify the challenges in data collection and see how a common event format and a symptom repository help address those challenges.

While the first part of the series discusses the current obstacles to effective data collection, in the consecutive articles:

  • See an overview of the architecture and functions of the Log and Trace Analyzer - Java Desktop (LTA-JD) and view an installation guide.
  • Take a visual tour of the technology, get troubleshooting tips and learn to maximize the performance out of the LTA-JD.
  • Dive into the IBM Tivoli Monitoring Events Tool view of the LTA-JD.

The challenges of data collection

Problem determination is the detection and diagnosis of situations that affect the operational status or availability of business applications. One of the challenges of data collection is the time it takes for problem determination to take place. For example, products have their own proprietary format to write their content; likewise, applications, databases and networks each have specific formats to write their particular content. When a problem occurs in an application due to network failure to access the database, then the user must understand data from application, database, and network --this need to understand all of the various components increases the complexity involved in problem determination. Complexity increases because human intervention is required to manually correlate the log record--which is in various formats--and because the application interacts with more products and the failures that occur with these products.

Read all the articles in this series
Challenges in data collection

Meet the Log and Trace Analyzer for Java Desktop

A visual tour of LTA-JD

The Events Tool view of LTA-JD

The goal is to maximize business and IT system availability by minimizing the time it takes to recover from situations that affect system availability. This is accomplished by collecting the monitoring information and using tools to quickly detect meaningful conditions, diagnose the underlying problem, and apply available knowledge to restore normal business and IT system operations.

Often the combination of multiple and observed events reveals complex problems which make human analysis difficult and time consuming. Monitoring solutions may implement autonomic correlation and reaction to these problems in which simple event triage or more complex root-cause analysis of sets of events is performed. A much smaller set of root-cause events are then presented to the human operators for review and reaction.

Using the Log and Trace Analyzer (for the purposes of this series, the Java Desktop version) as a simplified symptomatic event visualizer, can help solve three major hurdles to more effective data collection:

  • The complexity of e-business systems. Today's business systems are a collection of distributed and heterogeneous software and hardware components.
  • The variety of data and collectors/adapters. Because of the variety of collectors and the vastness of the data collected, there are several problems that are created. These include: how to consume and publish proprietary data formats; how to make differing design and standards co-exist; how to integrate ad hoc and product-specific code; how to integrate the different skill sets required to configure, maintain, and tune the various systems; and how to overcome the difficulty in correlating for enterprise-to-enterprise problem diagnostics.
  • Overcoming instrumentation differences. Instrumentation differences include topics such as standards compliance, customer inconvenience and cost of ownership. In addition, when standardization is lacking, Management Tools (the consumers) need to be instrumented for every Managed Resource (the producers) with which they interact; the same is true in reverse. This is both costly and inefficient.

To handle these challenges, a set of tools must be defined to tackle them.

Defining the tools

In order to address the aforementioned problems, richer and normalized data is needed to enable cross-product analysis and correlation; it is, in fact, a prerequisite to effective root-cause analysis and automation. And standards are fundamental to this type of data; without standards, the event data is of little value to autonomic management in problem determination and action in response.

One way to alleviate this problem is to structure event data in four categories:

  • The source is the component that is affected by or has experienced the situation.
  • The reporter identification of the component that is reporting the situation. This is also known as source component of a situation.
  • The situation data is properties or attributes that describe the situation.
  • The Context/Correlation data is properties or attributes to correlate the situation with others.

How Common Base Event format/WSDM Event Format fits

This is where the Common Base Event format and WSDM Event Format fit into the picture. Common Base Event is an event definition that is an IBM initial implementation of the WSDM Event Formal (WEF). The Common Base Event format and the WEF provide the common structure in which logs can be represented so that the user has to understand one format for all the product logs. The various format of logs are converted to standard format with the help of adapters. The Common Base Event format and WEF standards have been designed in such a way that problem determination becomes simpler and faster. There are various elements to provide more details on the event occurred and there are tools available to view the problem records in Common Base Event format so that it becomes easy to understand the problem scenario.

The Common Base Event format, which is a consistent and a common format to represent an event produced during the operation of an IT system, facilitates effective intercommunication among disparate components that support logging, management, problem-determination, autonomic computing, and on-demand business functions in an enterprise.

Common Base Events provide

  • A consistent specification for the definition of normalized event and log information for various domains (business, security, network, system, etc.)
  • An exchange format for events and logs
  • Situation descriptions about the external operational capabilities of the component
  • Data that captures execution information within a component
  • Context data

Defining the symptoms

A symptom is a form of knowledge that indicates a possible problem or diagnosed situation in the managed environment. The classic definition of symptom is "a characteristic sign or indication of the existence of something else." A symptom is recognized when the monitored data (the thermometer reading) matches the symptom definition.

The autonomic computing definition is a bit more involved; it is "a characteristic sign or indication of a possible problem or situation happening in the context of one or more manageable resources." Which breaks down into three things:

  • It is a form of knowledge used to solve problems and situations automatically in an autonomic system.
  • It is composite records of information formed by the combination of raw or composite information into patterns.
  • It is a composition of other symptoms.

Connecting the definitions: Going from events to symptoms

You may be asking, "how do I get from events to symptoms?" Keep these definitions in mind: an event is an indication of something being monitored (for example, memory usage has exceeded a set limit) and a symptom is a characteristic sign or indication of a possible problem or situation happening in the context of one or more manageable resources. You link the two like this:

If event x (and y (and...) ) occur (under certain conditions), then report the occurrence and possible resolution actions

For example, memory usage has exceeded a set limit three times in a 10-minute stretch -- this would suggest a pattern that could benefit from a response of increasing your buffer sizes.

Using this information

The event visualization of LTA-JD utilizes the concepts of autonomic computing such as symptoms to represent, detect, evaluate, and resolve incidents and problems related to the IT infrastructure management and operations. In addition, symptom visualization and processing methods are suggested in order to enable efficient pro-active avoidance of these incidents and problems before they happen.

Now we come to the "value proposition." There are three ways that having the information proposed in this article can help:

  • It makes the management data more consumable to the end-user because
    • It gives you visualization of product symptoms from within problem determination tooling.
    • Symptoms (patterns) are more deterministic than individual events.
  • It helps reduce problem determination costs since
    • Administrators can use automated event correlation to recognize symptoms (and potentially, corrective actions).
    • Support personnel can access symptoms directly from the problem determination tools.
    • Cross-product symptom catalogs allow quick diagnosis for known errors.
  • It helps reduce maintenance costs since
    • Incremental improvements to symptom databases will reduce requests to Level 1, 2, and 3 support (L1, L2, and L3. L1 is the first line of support that answers when the customer calls. L2 gets involved when L1 cannot resolve the problem; it usually includes a more knowledgeable support engineer such as product's Subject Matter Expert. L3 is commonly those who are considered change team and/or development members that change the code and provide fixes.)

Introducing the tool

The tool to help you achieve this is the Log and Trace Analyzer for Java Desktop, a standalone simple Java event viewer to merge, filter, sort, analyze and display contents of event sources in a common event format (Common Base Event) for problem isolation and triage to problem analysis. The triage functionality coupled with the superior visualization mechanisms offered by the LTA-JD improves root cause analysis, problem prediction, and resolution. Domain expertise and symptom rules can be easily mined and captured using industry standard XPath expressions for quick detection and visualization of symptomatic events. Figure 1 shows a matrix of the Log and Trace Analyzer family of products on two spectra (analysis capabilities and user skills).


Figure 1. The Log and Trace Analyzer family
The Log and Trace Analyzer family

Log and Trace Analyzer, Java Desktop sits at the starting corner, but don't sell it short. It can enable end-to-end viewing of event sources across the heterogeneous environment, provide a customizable summary view, and offer the ability to select and expand any row from the summary view to display the full Common Base Event attributes. With it, you can also do multi-level filtering and sorting on any event properties, custom highlight triage events (single symptoms definitions), and save and share configuration settings.

In the next article, view an overview of the architecture and functions of the Log and Trace Analyzer, Java Desktop and a guide to installing it.



Resources

Learn

Get products and technologies

Discuss


About the authors

Abdi Salahshour is a Senior Software Engineer, problem determination architect, Master Inventor at IBM's Autonomic Computing Technology and Development, and is currently an architect for the Plug and Manage architecture. He began working for IBM in 1982 and served in many roles -- from design and development of database diagnostic tools to system management and self-healing architecture and enablement in heterogeneous and distributed environments. He was a member of IBM Problem Determination Council, is one of the authors of the IBM Common Base Event specification, one of the principal designers and implementers of the Generic Log Adapter, and the architect and designer of the Log and Trace Analyzer for Java Desktop.


Kalpana Doraisamy is a Staff Software Engineer at IBM focusing currently on Lightweight Infrastructure for Systems Management. In her previous role she worked with the Log and Trace Analyzer for Autonomic Computing for more than two years. She was one of the senior developers of the Log and Trace Analyzer for Java Desktop. She holds a bachelor's degree in Computer Science and Engineering from Government College of Technology, Coimbatore, India


Ajay G Rengasayee was a System Software Engineer at IBM India Software Lab, Autonomic Computing. He was a developer for Log and Trace Analyzer for Autonomic Computing and related technology for two years. He was one of the developers of the Log and Trace Analyzer for Java Desktop.




Rate this page


Please take a moment to complete this form to help us better serve you.



 


 


Not
useful
Extremely
useful
 


Share this....

digg Digg this story del.icio.us del.icio.us Slashdot Slashdot it!



Back to top