I'll begin by exploring issues surrounding problem determination and the often inadequate role that log messages play. The format and content of Common Base Events provide a solution. I'll explore the structure of Common Base Events in detail and identify major goals -- namely identifying components and classifying situations. You'll see that all events are pigeon-holed into twelve situation types. Finally, I'll briefly show how this works in practice: first, how log messages can be translated into Common Base Events; and second, how the events can be analyzed by an autonomic manager with the goal of healing a failing system.
How do problems manifest themselves and get diagnosed in a computer system? The answer is through the messages the application produces. Whether specialized technicians or general computer users, everyone has experience with these messages. Some messages are well written and assist with a rapid diagnosis and solution to a problem. Others are so terse or obscure that they do little more than infuriate their audience.
As a rule, good or bad, no two messages are ever the same, and there is enormous diversity in the style and format of messages between applications. Even within one application, you might find evidence of a range of message standards and authors. Not all messages are meant to communicate from application to user (or even application to expert). Many are produced by one application for use by another. Here more than anywhere else, there has to be a rigorously defined common understanding of those messages between applications.
All these messages function in a manner similar to the nervous system. Just as our nervous system uses nerve impulses to control heartbeat and body temperature without conscious thought, so autonomic computing systems rely on messages to ensure the well-being of applications without human intervention.
Message logs are invariably product-centric, adhering to standards and terminology that are unique to a particular vendor (or even to a particular application). Under these circumstances, how can a message guarantee consistency of interpretation? The answer is through a standard. Enter the Common Base Event model. This standard lends itself easily to several types of events -- especially logging, tracing, management, and business events.
Also of interest to you is that the Common Base Event model tackles the two big issues of message diversity: format and content.
The Common Base Event model ensures consistency of format through an XML schema definition (
.xsd), a natural choice for current times. Apart from imposing the necessary structure, XSD is a prerequisite for Web services. Ideally, you want autonomic control across diverse applications, even those from different vendors. Web services have become the common language of inter-application communication.
XSD also provides the necessary scope for ensuring consistent message content. As you shall see, this goes further than defining just the type of information a message should convey. Many places in the definition are prescriptive about the actual words that can be chosen to describe a situation or component.
Along with ensuring consistency of format and content, the Common Base Event model also encourages completeness. Messages are often created by developers under extreme time pressure; they have a lot of information to put into a limited space. Into this limited space, they must put details of the situation and the context in which it occurred (which component has broken, and where, for example). Small wonder, then, that log messages often fall short of this goal. The Common Base Event schema, however, has mandatory elements guaranteeing that this information will be supplied.
The Common Base Event model takes as its premise that messages are events that are indicative of an underlying situation. A given situation might give rise to a single event, but is more likely to cause a number of related events.
To structure event and situation data, the Common Base Event has three main parts at the top level. The definition documents for the Common Base Event call this the 3-tuple structure. These offer information about:
- The component reporting a particular situation
- The component affected by the situation
- The situation itself
The component affected by a situation is often also the component reporting the situation. Under these circumstances, the Common Base Event definition insists that only the affected component information should be included. After all, why clog up your network with duplicate data? (You'll see later that the type of information captured for affected and reporting components is identical.)
The third tuple -- information about the situation -- is mandatory. There is also core information associated with the Common Base Event itself, outside of the tuples. This information includes the key attributes that identify the event and denote its priority, for example. Several optional parts of the structure exist to support extra functionality and vendor-specific requirements.
This is discussed further down in the article.
XML schemas can be read, but are not the best vehicle for rapid overviews. The Common Base Event specification, however, contains Unified Modeling Language (UML) diagrams that summarize the data and relationship between different parts of the event. Figure 1 shows the Common Base Event and its three tuples at the top level:
Figure 1. Class diagram for Common Base Event schema -- top level
Here's a quick guide to the UML based on the diagram above:
- The three boxes show elements in the schema (
CommonBaseEventitself). The top rectangle within each box contains the name of the schema element.
- The rectangle below the name holds simple pieces of data for each element. A bright blue icon denotes each of these, and is followed by the name of the simple piece of data and its data type. The
Situationelement has one simple piece of data,
categoryName, which happens to be a string of characters (with constraints on values not shown on the diagram, but which I'll explore later).
- Don't be fooled; the three boxes don't match up with the three tuples! This is where the lines (showing UML associations) come into play. The box at the top,
CommonBaseEvent, is not one of the tuples, but is the root element that contains the three tuples.
CommonBaseEventhas some simple data of its own within the box, but also has three adjacent diamonds, each with an emanating arrow joining to the other three boxes. This means that
CommonBaseEventcontains these other complex elements (the UML term is aggregation).
- You'll see two diamonds and arrows leading from
ComponentIdentification. One is for the affected component, the other for the reporting component, shown by the line labels
reporterComponentIdrespectively. The number at the arrowhead shows how many times an element should occur within the
CommonBaseEvent. So for
sourceComponentId, the number is 1 (and always 1 -- it's mandatory). For
reporterComponentID, it's 0 or 1 (the
reporterComponentIdis only present when it's different from the
sourceComponentId. There's no easy diagram convention to show this rule; the diagram only shows that there's a minimum of no
CommonBaseEvent, and a maximum of 1.
- The third tuple,
Situation(data), is shown in the bottom left. As the arrowhead shows, there must be one of these elements present.
You might be wondering about the brevity of parts of the diagram. Surely, in
Situation, a situation isn't identified by one simple piece of data called
categoryName? The answer is no. This is an incomplete diagram.
Situation itself contains (aggregates) other elements that you will see later.
Before leaving the discussion of the overview UML for the Common Base Event, I'll return to the simple data in the
CommonBaseEvent box. This is meta-data about the event itself.
extensionNameis an optional property used to give a name to the event.
globalInstanceIdshould contain a value that uniquely identifies this event (even if stored in a database with all the other Common Base Events ever produced; the identifier should be truly globally unique.)
localInstanceId, if provided, serves a similar identifying function, but is only guaranteed to be unique within the process that produces a particular event.
creationTimeis a date/time stamp for the event.
severityis a number, in a range from 0 to 70, that grades the impact of the situation on the reporting component. There are seven predefined values in increments of 10 ranging from 0 to 60 (for example, 10-Information, 30-Warning and 60-Fatal). The severity value for a given event should be provided by a domain expert, based on a judgment of the human consequences of an event. During problem management, the most severe events should be the first that require attention.
priorityis a number in a range of 0 to 100, which denotes how this event should be treated by applications monitoring Common Base Events. The higher the number, the sooner those applications should deal with the event. There are three predefined values: 10-Low, 50-Medium, and 70-High.
msgis the human-readable text that is normally an indispensable component of an application message. It is, in fact, optional (though still recommended, especially in the absence of other catalog information that identifies human-readable text). This might sound strange -- how could an event message lack message text? It could be that the
MsgDataElementcomplex type is used instead of
msg, to support the late binding of messages for National Language Support (I might want to configure my application to report in English, French, or Romanian). Another reason could be that there is no message text to accompany a particular business event, and under these circumstances there is no need to manufacture spurious text just to keep the Common Base Event definition legal. Recall also that a prime goal of the Common Base Event model is to pave the way for the autonomic management of systems. An autonomic manager is likely to base its responses on the more fixed parts of the Common Base Event definition. This is likely to be easier than parsing a fickle piece of text intended for human consumption.
elapsedTimerelate to each other; the first keeps a count of the number of identical events in a given time interval, while
elapsedTimespecifies that time interval.
sequence numberacts a little like
priorityin that it specifies the order in which events should be processed (this is especially useful if the events are likely to arrive at the event-managing application in no particular order, but their processing sequence is critical).
versionrefers to the Common Base Event specification version, so that programs consuming common base events can deal with version compatibility issues.
otherDatais a catch-all for holding any pieces of data that are not present as named elements in the Common Base Event model because they are application-specific. Software that consumes Common Base Events might not be able to make use of this information, but should save and forward it nonetheless. Other application-aware software (or humans) in the event-forwarding chain might have a use for the information.
Having looked at the core header information in the Common Base Event, I'll now move on to look at component identification in some detail.
The following UML diagram shows just the
ComponentIdentification part of the Common Base Event, which is used both for affected and reporting components (the first two tuples):
Figure 2. Class diagram for component information in the Common Base Event schema
The clear aim of
ComponentIdentification is to identify a component. You have just seen that events themselves can be identified with a globally unique identifier (GUID). In an ideal world, components would also be globally identifiable in the same way. The Common Base Event model team acknowledge, however, that there are many differing techniques for identifying components, and allow for this within the definition.
So at the top level, the
ComponentIdentification element has a simple piece of data,
location. This might indicate any sort of physical address that indicates the location of the component. It's the next attribute,
locationType, that indicates whether
location is an IP address, SNA number, hostname or something else entirely. The specification provides nearly twenty well-known location types; the message is to stick to these if you can. Indeed, "Unknown" should be the value when a well-known location type is unavailable.
application property is typically the business name for a component. This is particularly helpful for components that do actually correspond to business applications, for example, Accounts Receivable Module v4r5.3.
Next, I'll examine the four properties with component in their name.
component itself names the component (whether it is an application, product, or subsystem), and
subComponent further distinguishes an element within the component. So to give a Java technology example,
component might be "My Currency Exchange Kiosk Application", and
subComponent the name of a Java class and method (com.mycompany.utilities.EuroConverter.toEuros()).
componentIdType attempts to identify the kind of component, and has a few well-known keywords (such as ProductName, DeviceName and ServiceName). The
componentType property is used to hold the recognized type of components, which might be vendor-specific (such as IBMDB2UDB). The Common Base Event specification has a lengthy appendix that categorizes all IBM® products and many other vendor products, using a well-defined name. These examples include
MicrosoftWindows_XP_Professional. The appendix recognizes component hierarchies, so for an application server you might host some of the following components --
J2EE_Application. The clear intent is to put these component type definitions into a namespace that can then easily be incorporated into later versions of the Common Base Event definition to validate the
instanceId makes sense when the component under scrutiny occurs more than once. This is less likely to be true for, say, a database (though it's not unknown). However, it would almost invariably be true for an Enterprise Java Bean (EJB). It is helpful to hold a reference to the EJB instance in this
Three of the remaining properties,
threadId, all deal with how an application or operating system would logically divide up work. These details should be supplied when appropriate and known.
With all these properties available, you can see that the reporting and affected components are well identified. Now look at what the Common Base Event model has to say about the situation that gave rise to the event.
The most exciting work within the Common Base Event definition centers around the data for a situation, which is the third and most critical of the Common Base Event tuples. Here again is a UML diagram, this time showing the detail for situation data:
Figure 3. Class diagram for situation data in the Common Base Event schema
The major element is
Situation, which contains one property of its own:
categoryName. This designates which of twelve situation types this event concerns. The
SituationType element itself sits under
Situation. The twelve boxes sharing an arrow pointing to
SituationType show the twelve situation types graphically, from
So what are these situation types? The Common Base Event model team analyzed thousands of messages from hundreds of different log files from many IBM products as well as those of different vendors. Out of all the events described in these logs, they sought to extract a few essential officially recognized situations.
What was very apparent from their analysis is that there were many ways of saying essentially the same thing. A component might "begin work", "be started", "launch", "bootstrap", or "be initialized." Any legacy message containing such words is likely to belong to a
StartSituation when translated to Common Base Event format.
Common to all the situation type descriptive information is a
reasoningScope, which denotes whether the scope of the event is INTERNAL to the affected component, or has a potential EXTERNAL impact.
From the class diagram, you can see that each of the twelve particular situation types contains its own additional simple data elements. In many cases, the data is the same; several situation types (
RequestSituation) all contain a
successDisposition simple data element, which has valid values of SUCCESSFUL or UNSUCCESSFUL (to indicate the outcome of the event).
The first situation type listed,
StartSituation, is reserved for events dealing with the starting of a component. Apart from the
successDisposition element (SUCCESSFUL/UNSUCCESSFUL),
StartSituation has a
situationQualifer. This has three self-explanatory values: START INITIATED, RESTART INITIATED and START COMPLETED. This structure is typical of the eleven remaining situation types, which are explained briefly below:
StopSituationis the counterpart of
StartSituation, for events to do with the shutdown process for a component.
ConnectSituationis for events describing aspects of a connection between one component and another.
RequestSituationis for a component to identify the success or status of a long-running or complex request.
ConfigureSituationis for a component to state some aspect of its configuration, perhaps in response to a configuration change request.
AvailableSituationis for a component to comment on its operational state or availability. This has a slightly different set of dispositions, which are explained fully in the Common Base Event specification.
ReportSituationis for events that report on some aspect of utilization. That is, CPU consumption, buffer size, memory allocation. This is qualified only by a
reportCategoryType, with valid values of PERFORMANCE, HEARTBEAT, SECURITY and STATUS.
CreateSituationis for events that mark a component creating something, such as a file, or document, or Enterprise Java Bean (EJB).
DestroySituationis the counterpart of
CreateSituation, for events dealing with the destruction of something.
FeatureSituationis for components to announce that some feature, such as a service, is available (or unavailable).
DependencySituationis for components to indicate that they cannot find another component (or feature) that they need. The qualifier
dependencyDispositionTypeindicates if the dependency is MET or NOT MET.
OtherSituationis a "catch all" situation type for events that don't fit any other category.
There are still a few complex types associated with the Common Base Event model to mention, illustrated in the UML diagram below. These are considered to be part of the third situation data tuple.
Figure 4. Class diagram for remaining situation data in the Common Base Event schema
MsgDataElement holds metadata about the message itself. Several property names refer to a catalog, which is a generic name for any repository holding message text and expected parameters. This might be as simple as a properties file, or as sophisticated as a message database. A
msgId is the traditional message identifier, usually between seven or nine characters, and often packed with abbreviations to denote severity or the part of the application going wrong.
msgIdType acknowledges that there are some standards for message identifier composition; if you adhere to one, put it here. You might be puzzled at the optionality of all this good information. However,
MsgDataElement is intended to support late-binding message parameters (in case, for example, you wanted to vary the national language when displaying the message). This is a feature of many product-specific event managers. However, an autonomic manager should be able to manage situations without the information that
MsgDataElement contains precisely because the information hasn't been subject to the kinds of rigorous classification you saw going on in the other parts of the schema. That said, if the information is available, it should certainly be supplied to the Common Base Event.
ExtendedDataElement can appear any number of times, with each instance holding any number of
ExtendedDataElements within it. As you can probably guess from the general-purpose names of the simple data therein (
ExtendedDataElement is a flexible bucket for any kind of event data that can't be accommodated elsewhere.
ContextDataElement gives an opportunity to hook particular events to some kind of context. This context is arbitrary, and is defined by the product or application. Each Common Base Event belonging to the same context will share the same
contextValue. One possible use might be for a significant business process that can't be held together by any of the component attributes such as "month-end management reporting.". Note that the relationship between
ContextDataElement allows the same Common Base Event to participate in many different contexts. Thus,
ContextDataElement allows the correlation of messages across events, which might also include all the messages generated by executing a transaction or some other unit of work.
Finally, there is
AssociationEngine. The association engine is an application that establishes a relationship between events. One or many association engines might be invoked as part of the process that converts existing log messages to Common Base Events. Just as for contexts, the same event can be tied to as many association engines as required. The addition of associated event information might even be the job of an autonomic manager that processes Common Base Events (so the autonomic manager can act as an association engine).
Whatever creates associated engine information, the type property can be used much as
contextValue to describe the type of association that exists between events. The Common Base Event envelope references
AssociationEngine in two different ways. An individual event can itself hold references to the
AssociationEngines that used it. Alternatively, the envelope can list the
AssociationEngine together with all the ids for the events that particular engine has resolved.
I'll conclude this detailed discussion of the Common Base Event model properties with the class diagram for the entire model.
Figure 5. Class diagram for the entire Common Base Event model
|XML error: The image is not displayed because the width is greater than the maximum of 580 pixels. Please decrease the image width.|
Whether you agree with the situation type classifications and the other definitions in the Common Base Event model, the point is that it's better to have a common taxonomy. Existing software suppliers have quickly seen the advantage such a system brings; Cisco, for example, has followed IBM's lead. Where multiple products subscribe to the Common Base Event model, there is real scope for autonomic control -- even across products from different vendors.
Clearly, there is a lot of work involved in making your product Common Base Event-compliant. No vendor is prepared to immediately rework all of their software so that all of the events produced are Common Base Events. The obligation is to have the capability of producing Common Base Events, and not necessarily to store your existing logged entries in the Common Base Event XML format. This is best achieved by having a piece of software called an adapter, which takes existing log messages and maps these on to a Common Base Event equivalent. There will still be a lot to do to define those mappings, but this can be an incremental process. IBM has produced the Generic Log Adapter, which is offered as a component of the Log and Trace Analyzer that is packaged with the Autonomic Computing Toolkit. Vendors can use this tool to immediately enable many products to take advantage of the common format and to bridge the gap for legacy software until the common format is generated directly by software.
The Common Base Event specification lists common words from existing logs that might indicate the situation type for an event. So, for example, the specification offers the following guidance that might identify a legacy log message as belonging to a
Listing 1. Message guidance
Existing situations include words like now available, currently available, and transport is listening on port 123, for example: SRVE0171I: Transport HTTPS is listening on port 9443 MSGS0601I: WebSphere Embedded Messaging has not been installed
Figure 6 shows the translation process for messages generated from the WebSphere Application Server:
Figure 6. How a situation created for a typical log message is translated to Common Base Event
This article started by looking at the problems of human intervention in managing today's complex systems, and what a toll this took on IT resources. It went on to pinpoint the text of different log formats as being a serious underlying problem, especially when log messages are the nervous system of a computer application and are the foundation of automated management.
It went on to examine the advantages of the Common Base Event model and how this brings consistency of format and content to logging messages, as well as completeness of contextual information. Common Base Event messages were then put under the microscope: I examined the three parts, or tuples, of the definition in some detail. First, the article looked at the two identical tuples devoted to component information, and secondly at situation data -- in particular the situation type -- that rationalizes hundreds of situation descriptions to a mere dozen.
The article concluded by showing that the transition to the Common Base Event model can be done incrementally through the use of software adapters to translate from old log formats to a Common Base Event format. An autonomic system must go beyond one product to be effective, and preferably across products from different vendors. It should be possible to plug in different components without compromising the autonomic system. The Common Base Event model is a common and open standard that makes this possible, designed from the ground up to integrate with Web services. I expect to see many software vendors follow IBM's initiative in the near future and embrace the advantages it offers.
- For a look at how IBM and Cisco work toward a self-healing infrastructure, see "IBM and Cisco Collaborate on Autonomic Computing Solutions."
- The specifications on the Common Base Event model are provided in this Eclipse.org whitepaper, "Canonical Situation Data Format: The Common Base Event V1.0.1" (2004). (PDF format)
- You can link to a press release about the submission of the Common Base Event model to the Organization for the Advancement of Structured Information Standards (OASIS).
- More about OASIS and Web service ratification is available at OASIS.
- See the architectural blueprint for autonomic computing from IBM.
- IBM has an autonomic computing manifesto that shows IBM’s perspective on the state of information technology.
- For a vision of autonomic computing that gives a good layman's guide to the underlying theory, take a look at "The Vision of
- Learn more about other autonomic computing topics at the IBM library.
David Bridgewater, a Studio B author, has a long career in application development, and spent some of his happiest professional years helping a large UK retail company embrace Java technologies. Now he works as a contract Java/WebSphere® trainer for IBM, as well as producing his own training materials and tutorials. He is a regular contributor to technical journals, focusing on Web application development with Java and J2EE, and supporting IBM technologies (WebSphere software). He can be reached at: email@example.com.