This article discusses the new autonomic computing symptoms format, which is an evolution of the current symptoms format version 1.0, widely available in the autonomic computing toolkit under the Log and Trace Analyzer component. This article briefly addresses the following points:
- How knowledge in the autonomic computing context is defined
- How symptoms are a form of knowledge (see Resources)
- The various components of a symptom
- The roles of a symptom in the autonomic computing architecture
As the series progresses, you'll get specific examples of canonical symptoms and how they fit in a generic symptoms taxonomy that can be reused by many different environments.
A symptom is a form of knowledge that indicates a possible problem or situation in the managed environment. Basically, symptoms are indications of the existence of something else. In a medical analogy, a symptom of a high fever might be defined as a temperature greater than 100 degrees Fahrenheit, and would be recognized when a patient takes his temperature and the thermometer reads 101 Fahrenheit. In this example, the symptom is defined by the expression temperature greater than 100 Fahrenheit and described as high fever. Such a symptom is recognized when the monitored data (the thermometer reading) matches the symptom definition. When the symptom is recognized, you say that a symptom occurrence exists.
In autonomic computing, symptoms are recognized in the monitor component of the control loop, and used as a basis for an autonomic analysis of a problem or a goal. Symptoms are based on predefined elements (definitions and descriptions) provided to the autonomic manager, along with data that the monitoring infrastructure collects from managed resources, like events. The symptom definition expresses the conditions used by the monitor component to recognize the existence of a symptom, and the symptom description specifies the unique characteristics of a particular symptom that is recognized.
Most of the time, symptoms are closely connected to the self-healing discipline of autonomic computing because their primary intent is to indicate a problem, which is in the realm of the self-healing discipline. But, symptoms can also be used as triggers for other kinds of problems, such as those of other disciplines -- self-protecting, self-optimizing, and self-configuring. Virtually all kinds of problems or predictions may start due to the occurrence of a symptom.
The autonomic computing symptoms architecture defines several artifacts that need to work together so a symptom can be recognized and an occurrence can be created. Following are the main artifacts involved in this collaboration:
- Symptom element
- Contains all information necessary to create a new symptom occurrence. It is the atomic unit that defines a symptom at authoring time.
- Symptom occurrence
- Contains the run-time information associated with a specific instance of a symptom element. Each occurrence basically refers to the same symptom as it is defined in the symptom element, but the context to which it is applied may vary.
- Correlation engine
- Contains the logic used to create symptom elements. As input the correlation engine receives external stimuli and checks if a symptom occurrence should be created as a response.
When put together, these artifacts provide a way for an autonomic manager to process events and correlate them together to create symptom occurrences (which are created using a symptom element as a template).
A symptom element itself is a collection of sub-artifacts that will fulfill various aspects of a symptom. The following sub-artifacts are combined to form a symptom element.
- Symptom metadata
- The generic part of the information that composes a symptom. It is present on all kinds of knowledge, and is used when knowledge must be treated generically, even though it is a symptom element. This is the "what" part of a symptom.
- Symptom schema
- The specific part of the information that composes a symptom. It is the template that is used when a symptom occurrence is created. In a symptom, the specific schema defines attributes only present for symptoms, such as a symptom description, priority, an associated example, probability, and so on. Along with the symptom metadata, the symptom schema composes the "what" part of a symptom.
- Symptom definition
- A generic piece of logic that can be used to recognize a symptom. As expected, this logic should be compatible with the respective correlation engine that will be used to process the symptom. This is the "how" part of a symptom.
The symptom metadata and schema together hold the collection of information associated with symptom occurrences, and fully describe any such occurrence in terms of the data it carries. Each symptom element can also have one or more symptom definitions associated to it; this defines how events must be arranged in order for a symptom to be recognized and for the symptom occurrence to be created. Symptoms and their content provides more information.
Figure 1 shows how the artifacts are arranged to form a symptom element and how they are combined to form a symptom occurrence.
Figure 1. Symptom artifacts
In addition to the three core symptom sub-artifacts described above, there is also the symptom effect element.
- Symptom effect
- The symptom model also defines an extra element that is responsible for describing what should be the reaction to a symptom occurrence. This element is extended by actions, recommendations, or other elements. However, the autonomic computing architecture defines a much more reliable model for requests, plans, and operations to be executed when analysis determines that a problem exists. This model depends on additional knowledge types, such as change requests and change plans. The symptom effect may be used as a short-cut to this more reliable model, or for very simple situations where further analysis is not necessary. Together with the symptom definition, the symptom effect composes the "how" part of the symptom.
Figure 2 shows a conceptual model of all symptom elements and how they are structured. The figure does not show content, but only the relationships between elements.
Figure 2. Symptom structural model
As shown, the symptom element is the main element that holds all definitions associated with a symptom. It is complemented by the symptom occurrence, which is the run-time element that holds extra information associated with the specific context of a symptom instance.
Other elements also play a role in the symptom's model, such as the correlation engine and the symptom repository (also known as the knowledge source or symptom container).
Other important relationships are:
- Indicates that a symptom inherits information from another parent symptom. If a symptom property or attribute is not defined, the information is to be looked up at its parent.
- Indicates all other child symptoms that depend on information defined in this symptom. When a change is performed in the symptom information, it potentially also affects all child symptoms.
- Points to a collection of other knowledge artifacts, which could be other symptoms, or common base events, or other specific and proprietary data, that together cause this symptom to be recognized. If the symptom definition is a rule or a pattern that combines, for example, three other instances of common base events to trigger and recognizes the symptom, then all three common base event instances are linked to the symptom in this relationship.
- Exists because a symptom is not always the root cause of a problem. Analysis could continue, and a better symptom could be recognized as a better description for a particular problem. When this happens, the more specific symptom is said to be the root cause of the more generic one. This relationship links generic symptoms to their more specific ones, forming a tree of linked elements, where the root of the tree is the ultimate root cause of our analysis. The set of all rootCause relationships from any given point to the root cause symptom is the correlation trail of a symptom. This relationship is the basis for the implementation of root cause analysis with symptoms.
- Type relationships
- There are multiple "type" relationships in the main symptom elements. Such relationships denote extension points where the model can be extended to include more information associated to a symptom. Normally, these extension points are described by different XML schema documents than those defined for the core symptom model.
An autonomic manager will load symptom elements authored by development tools and will maintain them in its internal knowledge container. Such information is then used by the monitor part of the autonomic computing control loop to create new symptom occurrences as a response to events that come into the autonomic manager from the sensor interface.
After symptoms reside in the autonomic manager knowledge container, they can be individually activated and are then available to process the inflow of events. This happens with correlation engines that are implemented inside the autonomic manager. The correlation engines must be able to process symptom definitions as they are defined in the knowledge container, and must support the processing of events in the format they are acquired by the autonomic manager. After a new event is acquired, it is forwarded to the correlation engine, which in turn processes all active symptom definitions and tries to match them against the new event being processed. This process is called event correlation.
Each autonomic function implemented by an autonomic manager, whether it is self-healing, self-protecting, self-configuring, or self-optimizing, is defined as a collection of knowledge and how this knowledge is processed. Usually, it starts in the monitor part of the autonomic control loop, and is based on symptoms recognition to trigger changes related to the function being implemented. This collection of concepts is the foundation of autonomic computing.
Figure 3 shows the internals of an autonomic manager, and how a correlation engine is used to match events to symptom definitions in order to produce new symptom occurrences.
Figure 3. Symptoms inside an autonomic manager
The figure shows an exploded view of the monitor part of the autonomic control loop, where a symptom is recognized by a correlation engine and a new symptom occurrence is created. At the same time, the symptom occurrence is shown as the information that flows from the monitor to the analysis part of the autonomic control loop.
The following sections describe the content associated with each major element of the symptoms conceptual model. All of this information together fully describes a symptom at authoring time and must be carefully planned by symptom authors.
As noted, symptom metadata is common information present not only in all symptoms, but also in all other types of knowledge. It is the common information used by an autonomic manager or any other artifact that needs to manipulate knowledge. Table 1 shows the properties that comprise the symptom metadata.
Table 1. Symptom metadata
|Identification||Identifies a symptom uniquely within the autonomic computing environment. Identification is a unique alphanumeric identifier and, optionally, a human readable name that can be displayed and denotes the symptom.|
|Versioning||Contains the change history associated with the symptom. Composed of an array of elements that contain the author of the change, when the change was made, and a comment describing what was changed.|
|Annotation||Describes the symptom in a human readable form. An array of comments can also be added and maintained in the annotation property of the symptom. Comments will typically be used to explain different characteristics of the symptom.|
|Location||Tells where the authoritative version of this symptom resides. It points to the original K-source that contains the symptom, but it can also point at alternative mirrors where the same symptom is expected to be in sync with the master definition in the original K-source. The mirroring capability is useful if the master K-source cannot be reached for any reason, and is used for fallback reasons.|
|Scope||A very important piece of information for a symptom. The symptom context refers to the manageable resource type a symptom can be applied to. At run time, the scope property will also contain the context associated with the symptom occurrence. For example, the instance of the manageable resource type that is the root cause of the problem or indication defined by the symptom.|
|Lifecycle||A run-time property containing the current state associated with a symptom occurrence. Different types of knowledge define a different state machine that is relevant to the form of knowledge they represent. In the case of symptoms, the states are: created, building, analyzed, planning, executing, scheduled, completed, expired, and fault.|
Figure 4 shows what properties are contained not only in the symptom metadata, but in each of the symptom artifacts.
Figure 4. The symptom content model
Although the symptom model has only one artifact explicitly called metadata, the whole set of properties associated with all artifacts compose the generic knowledge information associated to the symptom, and can be thought as extended metadata content.
The symptom schema is the specific information present only in symptoms and is not relevant to any other forms of knowledge. Table 2 shows the information that comprises the symptom schema.
Table 2. Symptom schema
|Description||Explains in a human readable form what the symptom is about. Describes the kinds of problems or situations associated with the symptom when a symptom occurrence is recognized by an autonomic manager.|
|Example||Shows in a human-readable form an example of a problem or situation where the symptom is likely to occur.|
|Solution||Shows in a human-readable form a possible solution for the problem or situation described by the example attribute.|
|Reference||Contains a URL associated with the symptom that lets a user get the latest information associated with that symptom from the Web. This is useful for the maintenance of the symptom information in distributed autonomic managers, so they can make sure they always hold the latest definition of a given symptom before they display or act upon it.|
|Type||Contains the type associated with the symptom occurrence. It ultimately equates to a symptom category that enables you to organize multiple symptoms in a common taxonomy of symptoms. This is used for classificatory purposes and may be displayed to the end user for information purposes.|
|Probability||Denotes the probability or certainty associated with the problem or situation indicated by a symptom occurrence. This value is assigned strictly at run time by the correlation engine that processes an event and creates a symptom occurrence as a result. Assignment of probability to symptom occurrences is a responsibility of the correlation engine implementation.|
|Priority||Denotes the priority of a symptom occurrence in relation to other symptom occurrences with the same scope. This value is assigned strictly at run time by the correlation engine that processes an event and creates a symptom occurrence as result. Assignment of priority to symptom occurrences is a responsibility of the correlation engine implementation.|
This information is subject to change in future versions of the autonomic computing symptoms specification.
Symptom definition is an artifact used to recognize a symptom occurrence. Symptom definitions must be compatible with their respective correlation engines implemented by the autonomic manager that processes the symptom definition. In other words, if an autonomic manager loads symptoms containing symptom definitions of type A, it must implement a correlation engine that is capable of processing A as well. The autonomic computing architecture will define filters for batch loading of symptoms based on their symptom definition types.
A symptom definition can be anything. Some examples are:
- XPATH expression
- Regular expression
- Decision tree
- Dependency graph
- Prolog predicate
- ACT pattern
- TEC rule
- Neural network
See Resources for more information on some of these forms.
It is important to note that any of these expressions, rules, patterns, and other constructs will work in a similar way. All of them use a normalized form of events (either the IBM Common Base Event or the Web Services Distributed Management (WSDM) event format (WEF)). The end result, and intent, is the creation of new symptoms when events match the specified expressions, rules, patterns, or other constructs.
Up to now, the previously defined symptom artifacts provided a common and flexible way to recognize and create symptoms. What is still lacking is the capability to react to a symptom, so you know what the effect will be of a symptom occurrence after it's created. This is the role of the symptom effect artifact.
In the autonomic computing architecture, the problem is also addressed by other parts of the autonomic control loop that ultimately will define what kind of reaction is expected when a symptom occurrence is created. The analysis, planning, and execution parts of the control loop accomplish this, and the change requests and change plans supported it. But, in some simple situations where no analysis or planning is performed, a symptom can also be used to define the kind of reaction expected after it is recognized. This is particularly useful in autonomic managers where the Analyze and Plan parts of the autonomic loop are not present or, if they are, the particular symptom that was recognized won't trigger any change request. It could also be used in an autonomic manager that implements an on-the-fly strategy for creating change requests, which can be created using the symptom effect as a starting point.
The symptom effect artifact can be extended to be several things. It could be, for example, an action to be performed in a manageable resource, a human readable recommendation, or something simple such as running a script or a piece of code. The current symptom specification defines only two forms of effect:
- A textual representation of what an operator should do to fix the problem associated with a particular symptom. Primarily focused on the interaction between autonomic managers and manual managers.
- A piece of code that defines tasks and procedures used to fix the problem associated with a particular symptom. Primarily automated and focused in the manageable resource defined by the scope property of the symptom.
As mentioned, these forms of symptom effect could be augmented and extended, depending on the needs of a particular autonomic manager.
The autonomic computing symptoms format provides a rich and flexible foundation that can be applied to several existing products and solutions, such as Tivoli® Monitoring and Tivoli Enterprise Console. The symptoms format lets you create a clear roadmap to turn products and solutions into a fully autonomic environment. A common symptoms format will also provide consistency and enable necessary interoperation among autonomic managers. Interoperation is greatly needed for cooperation purposes among autonomic managers and other management applications in solutions associated with problem determination, prediction, and other autonomic computing goals.
Symptom elements are used to configure autonomic managers. Then, when events are received, autonomic managers use these elements and perform event correlation functions by processing symptom definitions. After a match is found, the symptom metadata and schema are processed as templates to create symptom occurrences. An autonomic manager may react to symptom occurrences by executing symptom effects associated with them. They enable fully autonomic functions to be implemented on existing products and solutions.
The next article in this series will plunge into real examples of symptoms extracted from existing products and solutions. It will show how the symptoms are identified, how they affect their environment, and which actions can be automatically invoked to deal with the symptoms. A small list of canonical symptoms will be presented, which will enable reuse of problem determination capabilities in different products and solutions.
"An architectural blueprint for autonomic computing:" Read this IBM White Paper for an overview of the autonomic computing architecture, including more information about knowledge, and other architectural building blocks.
"The autonomic computing edge: The role of knowledge in autonomic systems" (developerWorks, 2005): Browse this series on autonomic computing to learn more about hot topics.
For more references on the WSDM Event Format (WEF), refer to the Management Using Web Services (MUWS), Part 1 and Part 2 standard documents.
- XML Path Language, version 1.0: Learn how to build XPATH expressions from this online reference.
- "The autonomic computing edge: Can you CHOP up autonomic computing" (developerWorks, June 2005) includes a section on symptoms under "Integrated self-CHOP scenarios."
- For more information on the current symptoms 1.0 format and the Log and Trace Analyzer component, refer to the Autonomic Computing Toolkit.
- developerWorks blogs: Get involved in the developerWorks community.
- Dave Bartlett, IBM VP, blogs
each week on his thoughts about the state of autonomic computing in the industry.
Marcelo Perazolo is a member of the IBM Autonomic Computing Architecture team, where he serves as an architect for symptoms and other knowledge formats and defines Management Integration Taxonomies related to autonomic computing. He has worked for IBM since 1990, with various assignments in network and systems management. Marcelo received an M.S. degree in Electrical Engineering in 1994. His interests include problem determination and prediction, process optimization techniques, security, correlation technologies, and knowledge representation.