Symptoms deep dive, Part 1: The autonomic computing symptoms format

Know thy symptoms, heal thyself

This article, the first in a series, dives deep into the underworld of the autonomic computing symptoms format. It introduces the autonomic computing symptoms architecture and format, and details symptoms, including such information as how symptoms are represented, how to identify them, the advantages for using a standard symptom representation, and how to adopt them as part of your systems management strategy.

Marcelo Perazolo (mperazol@us.ibm.com), Autonomic Computing Architecture, IBM

Marcelo PerazoloMarcelo Perazolo is a member of the IBM Autonomic Computing Architecture team, where he serves as an architect for symptoms and other knowledge formats and defines Management Integration Taxonomies related to autonomic computing. He has worked for IBM since 1990, with various assignments in network and systems management. Marcelo received an M.S. degree in Electrical Engineering in 1994. His interests include problem determination and prediction, process optimization techniques, security, correlation technologies, and knowledge representation.



18 October 2005

Introduction

This article discusses the new autonomic computing symptoms format, which is an evolution of the current symptoms format version 1.0, widely available in the autonomic computing toolkit under the Log and Trace Analyzer component. This article briefly addresses the following points:

  • How knowledge in the autonomic computing context is defined
  • How symptoms are a form of knowledge (see Resources)
  • The various components of a symptom
  • The roles of a symptom in the autonomic computing architecture

As the series progresses, you'll get specific examples of canonical symptoms and how they fit in a generic symptoms taxonomy that can be reused by many different environments.


What is a symptom?

A symptom is a form of knowledge that indicates a possible problem or situation in the managed environment. Basically, symptoms are indications of the existence of something else. In a medical analogy, a symptom of a high fever might be defined as a temperature greater than 100 degrees Fahrenheit, and would be recognized when a patient takes his temperature and the thermometer reads 101 Fahrenheit. In this example, the symptom is defined by the expression temperature greater than 100 Fahrenheit and described as high fever. Such a symptom is recognized when the monitored data (the thermometer reading) matches the symptom definition. When the symptom is recognized, you say that a symptom occurrence exists.

In autonomic computing, symptoms are recognized in the monitor component of the control loop, and used as a basis for an autonomic analysis of a problem or a goal. Symptoms are based on predefined elements (definitions and descriptions) provided to the autonomic manager, along with data that the monitoring infrastructure collects from managed resources, like events. The symptom definition expresses the conditions used by the monitor component to recognize the existence of a symptom, and the symptom description specifies the unique characteristics of a particular symptom that is recognized.

Most of the time, symptoms are closely connected to the self-healing discipline of autonomic computing because their primary intent is to indicate a problem, which is in the realm of the self-healing discipline. But, symptoms can also be used as triggers for other kinds of problems, such as those of other disciplines -- self-protecting, self-optimizing, and self-configuring. Virtually all kinds of problems or predictions may start due to the occurrence of a symptom.


Symptom artifacts and relationships

The autonomic computing symptoms architecture defines several artifacts that need to work together so a symptom can be recognized and an occurrence can be created. Following are the main artifacts involved in this collaboration:

Symptom element
Contains all information necessary to create a new symptom occurrence. It is the atomic unit that defines a symptom at authoring time.
Symptom occurrence
Contains the run-time information associated with a specific instance of a symptom element. Each occurrence basically refers to the same symptom as it is defined in the symptom element, but the context to which it is applied may vary.
Correlation engine
Contains the logic used to create symptom elements. As input the correlation engine receives external stimuli and checks if a symptom occurrence should be created as a response.

When put together, these artifacts provide a way for an autonomic manager to process events and correlate them together to create symptom occurrences (which are created using a symptom element as a template).

A symptom element itself is a collection of sub-artifacts that will fulfill various aspects of a symptom. The following sub-artifacts are combined to form a symptom element.

Symptom metadata
The generic part of the information that composes a symptom. It is present on all kinds of knowledge, and is used when knowledge must be treated generically, even though it is a symptom element. This is the "what" part of a symptom.
Symptom schema
The specific part of the information that composes a symptom. It is the template that is used when a symptom occurrence is created. In a symptom, the specific schema defines attributes only present for symptoms, such as a symptom description, priority, an associated example, probability, and so on. Along with the symptom metadata, the symptom schema composes the "what" part of a symptom.
Symptom definition
A generic piece of logic that can be used to recognize a symptom. As expected, this logic should be compatible with the respective correlation engine that will be used to process the symptom. This is the "how" part of a symptom.

The symptom metadata and schema together hold the collection of information associated with symptom occurrences, and fully describe any such occurrence in terms of the data it carries. Each symptom element can also have one or more symptom definitions associated to it; this defines how events must be arranged in order for a symptom to be recognized and for the symptom occurrence to be created. Symptoms and their content provides more information.

Figure 1 shows how the artifacts are arranged to form a symptom element and how they are combined to form a symptom occurrence.

Figure 1. Symptom artifacts
Symptom artifacts

In addition to the three core symptom sub-artifacts described above, there is also the symptom effect element.

Symptom effect
The symptom model also defines an extra element that is responsible for describing what should be the reaction to a symptom occurrence. This element is extended by actions, recommendations, or other elements. However, the autonomic computing architecture defines a much more reliable model for requests, plans, and operations to be executed when analysis determines that a problem exists. This model depends on additional knowledge types, such as change requests and change plans. The symptom effect may be used as a short-cut to this more reliable model, or for very simple situations where further analysis is not necessary. Together with the symptom definition, the symptom effect composes the "how" part of the symptom.

Figure 2 shows a conceptual model of all symptom elements and how they are structured. The figure does not show content, but only the relationships between elements.

Figure 2. Symptom structural model
The symptom structural model

As shown, the symptom element is the main element that holds all definitions associated with a symptom. It is complemented by the symptom occurrence, which is the run-time element that holds extra information associated with the specific context of a symptom instance.

Other elements also play a role in the symptom's model, such as the correlation engine and the symptom repository (also known as the knowledge source or symptom container).

Other important relationships are:

SymptomElement.parent
Indicates that a symptom inherits information from another parent symptom. If a symptom property or attribute is not defined, the information is to be looked up at its parent.
SymptomElement.child
Indicates all other child symptoms that depend on information defined in this symptom. When a change is performed in the symptom information, it potentially also affects all child symptoms.
SymptomElement.correlatedOccurrence
Points to a collection of other knowledge artifacts, which could be other symptoms, or common base events, or other specific and proprietary data, that together cause this symptom to be recognized. If the symptom definition is a rule or a pattern that combines, for example, three other instances of common base events to trigger and recognizes the symptom, then all three common base event instances are linked to the symptom in this relationship.
SymptomElement.rootCause
Exists because a symptom is not always the root cause of a problem. Analysis could continue, and a better symptom could be recognized as a better description for a particular problem. When this happens, the more specific symptom is said to be the root cause of the more generic one. This relationship links generic symptoms to their more specific ones, forming a tree of linked elements, where the root of the tree is the ultimate root cause of our analysis. The set of all rootCause relationships from any given point to the root cause symptom is the correlation trail of a symptom. This relationship is the basis for the implementation of root cause analysis with symptoms.
Type relationships
There are multiple "type" relationships in the main symptom elements. Such relationships denote extension points where the model can be extended to include more information associated to a symptom. Normally, these extension points are described by different XML schema documents than those defined for the core symptom model.

Symptoms inside an autonomic manager

An autonomic manager will load symptom elements authored by development tools and will maintain them in its internal knowledge container. Such information is then used by the monitor part of the autonomic computing control loop to create new symptom occurrences as a response to events that come into the autonomic manager from the sensor interface.

After symptoms reside in the autonomic manager knowledge container, they can be individually activated and are then available to process the inflow of events. This happens with correlation engines that are implemented inside the autonomic manager. The correlation engines must be able to process symptom definitions as they are defined in the knowledge container, and must support the processing of events in the format they are acquired by the autonomic manager. After a new event is acquired, it is forwarded to the correlation engine, which in turn processes all active symptom definitions and tries to match them against the new event being processed. This process is called event correlation.

Each autonomic function implemented by an autonomic manager, whether it is self-healing, self-protecting, self-configuring, or self-optimizing, is defined as a collection of knowledge and how this knowledge is processed. Usually, it starts in the monitor part of the autonomic control loop, and is based on symptoms recognition to trigger changes related to the function being implemented. This collection of concepts is the foundation of autonomic computing.

Figure 3 shows the internals of an autonomic manager, and how a correlation engine is used to match events to symptom definitions in order to produce new symptom occurrences.

Figure 3. Symptoms inside an autonomic manager
Symptoms inside an autonomic manager

The figure shows an exploded view of the monitor part of the autonomic control loop, where a symptom is recognized by a correlation engine and a new symptom occurrence is created. At the same time, the symptom occurrence is shown as the information that flows from the monitor to the analysis part of the autonomic control loop.


Symptoms and their content

The following sections describe the content associated with each major element of the symptoms conceptual model. All of this information together fully describes a symptom at authoring time and must be carefully planned by symptom authors.

Symptom metadata

As noted, symptom metadata is common information present not only in all symptoms, but also in all other types of knowledge. It is the common information used by an autonomic manager or any other artifact that needs to manipulate knowledge. Table 1 shows the properties that comprise the symptom metadata.

Table 1. Symptom metadata
PropertyDescription
IdentificationIdentifies a symptom uniquely within the autonomic computing environment. Identification is a unique alphanumeric identifier and, optionally, a human readable name that can be displayed and denotes the symptom.
VersioningContains the change history associated with the symptom. Composed of an array of elements that contain the author of the change, when the change was made, and a comment describing what was changed.
AnnotationDescribes the symptom in a human readable form. An array of comments can also be added and maintained in the annotation property of the symptom. Comments will typically be used to explain different characteristics of the symptom.
LocationTells where the authoritative version of this symptom resides. It points to the original K-source that contains the symptom, but it can also point at alternative mirrors where the same symptom is expected to be in sync with the master definition in the original K-source. The mirroring capability is useful if the master K-source cannot be reached for any reason, and is used for fallback reasons.
ScopeA very important piece of information for a symptom. The symptom context refers to the manageable resource type a symptom can be applied to. At run time, the scope property will also contain the context associated with the symptom occurrence. For example, the instance of the manageable resource type that is the root cause of the problem or indication defined by the symptom.
LifecycleA run-time property containing the current state associated with a symptom occurrence. Different types of knowledge define a different state machine that is relevant to the form of knowledge they represent. In the case of symptoms, the states are: created, building, analyzed, planning, executing, scheduled, completed, expired, and fault.

Figure 4 shows what properties are contained not only in the symptom metadata, but in each of the symptom artifacts.

Figure 4. The symptom content model
The symptom content model

Although the symptom model has only one artifact explicitly called metadata, the whole set of properties associated with all artifacts compose the generic knowledge information associated to the symptom, and can be thought as extended metadata content.

Symptom schema

The symptom schema is the specific information present only in symptoms and is not relevant to any other forms of knowledge. Table 2 shows the information that comprises the symptom schema.

Table 2. Symptom schema
AttributeDescription
DescriptionExplains in a human readable form what the symptom is about. Describes the kinds of problems or situations associated with the symptom when a symptom occurrence is recognized by an autonomic manager.
ExampleShows in a human-readable form an example of a problem or situation where the symptom is likely to occur.
SolutionShows in a human-readable form a possible solution for the problem or situation described by the example attribute.
ReferenceContains a URL associated with the symptom that lets a user get the latest information associated with that symptom from the Web. This is useful for the maintenance of the symptom information in distributed autonomic managers, so they can make sure they always hold the latest definition of a given symptom before they display or act upon it.
TypeContains the type associated with the symptom occurrence. It ultimately equates to a symptom category that enables you to organize multiple symptoms in a common taxonomy of symptoms. This is used for classificatory purposes and may be displayed to the end user for information purposes.
ProbabilityDenotes the probability or certainty associated with the problem or situation indicated by a symptom occurrence. This value is assigned strictly at run time by the correlation engine that processes an event and creates a symptom occurrence as a result. Assignment of probability to symptom occurrences is a responsibility of the correlation engine implementation.
PriorityDenotes the priority of a symptom occurrence in relation to other symptom occurrences with the same scope. This value is assigned strictly at run time by the correlation engine that processes an event and creates a symptom occurrence as result. Assignment of priority to symptom occurrences is a responsibility of the correlation engine implementation.

This information is subject to change in future versions of the autonomic computing symptoms specification.

The symptom definition artifact

Symptom definition is an artifact used to recognize a symptom occurrence. Symptom definitions must be compatible with their respective correlation engines implemented by the autonomic manager that processes the symptom definition. In other words, if an autonomic manager loads symptoms containing symptom definitions of type A, it must implement a correlation engine that is capable of processing A as well. The autonomic computing architecture will define filters for batch loading of symptoms based on their symptom definition types.

A symptom definition can be anything. Some examples are:

  • XPATH expression
  • Regular expression
  • Decision tree
  • Dependency graph
  • Prolog predicate
  • ACT pattern
  • TEC rule
  • Neural network

See Resources for more information on some of these forms.

It is important to note that any of these expressions, rules, patterns, and other constructs will work in a similar way. All of them use a normalized form of events (either the IBM Common Base Event or the Web Services Distributed Management (WSDM) event format (WEF)). The end result, and intent, is the creation of new symptoms when events match the specified expressions, rules, patterns, or other constructs.

Symptom effect - the extra artifact

Up to now, the previously defined symptom artifacts provided a common and flexible way to recognize and create symptoms. What is still lacking is the capability to react to a symptom, so you know what the effect will be of a symptom occurrence after it's created. This is the role of the symptom effect artifact.

In the autonomic computing architecture, the problem is also addressed by other parts of the autonomic control loop that ultimately will define what kind of reaction is expected when a symptom occurrence is created. The analysis, planning, and execution parts of the control loop accomplish this, and the change requests and change plans supported it. But, in some simple situations where no analysis or planning is performed, a symptom can also be used to define the kind of reaction expected after it is recognized. This is particularly useful in autonomic managers where the Analyze and Plan parts of the autonomic loop are not present or, if they are, the particular symptom that was recognized won't trigger any change request. It could also be used in an autonomic manager that implements an on-the-fly strategy for creating change requests, which can be created using the symptom effect as a starting point.

The symptom effect artifact can be extended to be several things. It could be, for example, an action to be performed in a manageable resource, a human readable recommendation, or something simple such as running a script or a piece of code. The current symptom specification defines only two forms of effect:

Recommendation
A textual representation of what an operator should do to fix the problem associated with a particular symptom. Primarily focused on the interaction between autonomic managers and manual managers.
Action
A piece of code that defines tasks and procedures used to fix the problem associated with a particular symptom. Primarily automated and focused in the manageable resource defined by the scope property of the symptom.

As mentioned, these forms of symptom effect could be augmented and extended, depending on the needs of a particular autonomic manager.


Summary

The autonomic computing symptoms format provides a rich and flexible foundation that can be applied to several existing products and solutions, such as Tivoli® Monitoring and Tivoli Enterprise Console. The symptoms format lets you create a clear roadmap to turn products and solutions into a fully autonomic environment. A common symptoms format will also provide consistency and enable necessary interoperation among autonomic managers. Interoperation is greatly needed for cooperation purposes among autonomic managers and other management applications in solutions associated with problem determination, prediction, and other autonomic computing goals.

Symptom elements are used to configure autonomic managers. Then, when events are received, autonomic managers use these elements and perform event correlation functions by processing symptom definitions. After a match is found, the symptom metadata and schema are processed as templates to create symptom occurrences. An autonomic manager may react to symptom occurrences by executing symptom effects associated with them. They enable fully autonomic functions to be implemented on existing products and solutions.


Stay tuned

The next article in this series will plunge into real examples of symptoms extracted from existing products and solutions. It will show how the symptoms are identified, how they affect their environment, and which actions can be automatically invoked to deal with the symptoms. A small list of canonical symptoms will be presented, which will enable reuse of problem determination capabilities in different products and solutions.

Resources

Learn

Discuss

  • developerWorks blogs: Get involved in the developerWorks community.
  • Dave Bartlett, IBM VP, blogs each week on his thoughts about the state of autonomic computing in the industry.

Comments

developerWorks: Sign in

Required fields are indicated with an asterisk (*).


Need an IBM ID?
Forgot your IBM ID?


Forgot your password?
Change your password

By clicking Submit, you agree to the developerWorks terms of use.

 


The first time you sign into developerWorks, a profile is created for you. Select information in your profile (name, country/region, and company) is displayed to the public and will accompany any content you post. You may update your IBM account at any time.

All information submitted is secure.

Choose your display name



The first time you sign in to developerWorks, a profile is created for you, so you need to choose a display name. Your display name accompanies the content you post on developerWorks.

Please choose a display name between 3-31 characters. Your display name must be unique in the developerWorks community and should not be your email address for privacy reasons.

Required fields are indicated with an asterisk (*).

(Must be between 3 – 31 characters.)

By clicking Submit, you agree to the developerWorks terms of use.

 


All information submitted is secure.

Dig deeper into Tivoli (service management) on developerWorks


  • developerWorks Labs

    Experiment with new directions in software development.

  • JazzHub

    Software development in the cloud. Register today and get free private projects through 2014.

  • IBM evaluation software

    Evaluate IBM software and solutions, and transform challenges into opportunities.

static.content.url=http://www.ibm.com/developerworks/js/artrating/
SITE_ID=1
Zone=Tivoli
ArticleID=96017
ArticleTitle=Symptoms deep dive, Part 1: The autonomic computing symptoms format
publish-date=10182005