A standard symptom taxonomy is a good starting tool for identifying and categorizing symptoms. It offers symptoms authors a common framework with which they can expand and promote the reuse of their individual symptoms in a more standardized way. In the third article in this series, I revisit the autonomic computing symptoms architecture and make a more detailed examination of the parts of that architecture that promote the classification of symptoms. I also introduce a method that can be used to identify symptom categories and I present a set of standard categories that were identified when I applied this method to sample problem determination data from multiple symptom sources. (For references on symptoms, their format and content, please see Resources.)
In the overall autonomic computing architecture, symptoms are a form of knowledge (see Resources) -- as such they convey information necessary for the analysis, identification, and resolution of situations handled by an autonomic manager. Figure 1 illustrates an overview of the autonomic computing symptoms reference architecture.
Figure 1. The autonomic computing symptoms reference architecture
This architecture is composed of the following main elements:
- Symptom metadata (the one I'll concentrate on in this article)
- Symptom schema
- Symptom rule
- Symptom effect
- Symptom definition
- Symptom instance
- Correlation engine
- Symptom catalog
Before I move on to the topic of this article, let's look at each element in a little more detail.
- Symptom metadata
- Symptom metadata is common information present in all kinds of management knowledge; because a symptom is one of the types of knowledge supported by an autonomic manager, it must contain knowledge metadata. This includes things like a type, a category, an identifier, and so on.
- Symptom schema
- Symptom schema is information specific to the symptom only, not present in other knowledge types. For example, a symptom defines a hierarchy and must support that in its schema with attributes like a root cause parent symptom, as well as children symptoms. Other data includes things like a symptom probability, a priority, a description, and so on.
- Symptom rule
- Symptom rule defines how a symptom is recognized. It may be anything, depending on the correlation technology used to process the data and events that will give origin to the symptom, but usually it can be represented by normalized patterns or rules.
- Symptom effect
- The symptom effect defines the reaction to be performed by the autonomic manager after a symptom is recognized. An effect could be something immediate like "restart a router" or a further analysis of application dependency to be performed by the analysis layer. It can also be a textual recommendation intended for human consumption.
- Symptom definition
- The symptom definition is a collection of the four symptom elements -- metadata, schema, rule, and effect -- used just for grouping and cataloging purposes.
- Symptom instance
- After a symptom is identified, an instance of the information defined by the symptom schema is created. The symptom instance links with the symptom definition that was used for its creation to access metadata, rules, and effect elements.
- Correlation engine
- The correlation engine is the logical entity responsible by processing symptom definitions, extracting rules, and creating symptom instances. Symptom instances are the result of the processing of symptom rules by the correlation engine.
- Symptom catalog
- The catalog is the distributed component used to store, consume, and reuse symptom definitions. This is a multivendor, multisolution repository of information and contains downloadable symptoms that can be consumed by management tools.
For the purposes of this article, I'll concentrate on the metadata element; in particular, on the attribute used for symptom classification, which is part of the metadata. This attribute is important for efficient run-time processing of symptom instances and it should always be of utmost importance for symptom authors as well. A solid classification generally assures smoother processing when symptoms are identified in an autonomic manager. It also assures better composition of symptoms when further analysis is necessary for the creation of incidents, problems, and impact records.
In the course of identifying canonical categories that may be applied to symptoms, there are multiple considerations. As you know, there are many ways of categorizing management information, and a symptom is a form of management information.
Symptoms are positioned as being composite events; in other words, special events that are derived by the composition of other forms of management information emitted by manageable resources. Such forms of information may be (but are not restricted to):
- events (normalized or not)
- log and trace records
- static application data
- metric records
As such, one valid way to categorize symptoms would be with respect to the sources of composed information that are part of the symptom, but because the symptom instance already carries the correlation trail of a symptom (pointers to the components of the symptom), this categorization alternative is unnecessary.
Other valid ways to categorize symptoms would be to take into account where a resource is sensed, what a resource will affect, or even which resources produced the information used to compose the symptoms in the first place. This would be an example of a scope-centric categorization, but because this information is also already present in the symptom metadata, it does not add much value. Applications can infer the types of scope-centric categorization by looking at the symptom scope.
On the other hand, functional categorization is a useful form of categorization that could also be applied to symptoms. Symptoms could benefit from a functional categorization to indicate to applications which of the various IT processes and services a particular symptom applies to. As such, the standard form of categorization adopted by the symptoms reference architecture is a functional categorization. The following lists other predefined main categories for symptoms:
- QoS (Quality of Service)
As you can see, it's a short list, but there are many secondary categories you can derive from these main categories. Don't forget that this is a starting set and as such, can and should be expanded.
Now let's explore each of these main categories in more detail.
Symptoms in this category describe security problems; Table 1 shows the existing security subcategories.
Table 1. Security symptoms sub-categories
|Prevention||Problems related to the prevention of security problems:
|Authentication||Problems related to authentication of users and messages in a system:
|Authorization||Problems related to the authorization of actions or user access:
Listing 1 demonstrates the XML schema for the security category.
Listing 1. XML schema for the security category
<simpleType name="Security"> <restriction base="QName"> <enumeration value="security:prevention"/> <enumeration value="security:authentication"/> <enumeration value="security:authorization"/> </restriction> </simpleType>
For good examples of canonical symptoms in this category (security symptoms),
- Authentication Failure
- Authorization Failure
- Prevention Deployment Failure
please see the second article in the Symptoms deep dive series (see Resources for a link to the article).
Symptoms in this category describe operation problems; Table 2 highlights existing operations sub-categories:
Table 2. Operations symptoms sub-categories
|Execution||Problems related to the operation of the system:
|Logic||Problems related to the business logic of a system:
|Configuration||Problems related to the system configuration:
Listing 2 demonstrates the XML schema for the operations category.
Listing 2. XML schema for the operations category
<simpleType name="Operation"> <restriction base="QName"> <enumeration value="operation:execution"/> <enumeration value="operation:logic"/> <enumeration value="operation:configuration"/> </restriction> </simpleType>
For good examples of canonical symptoms in this category (service support symptoms):
- Configuration Unavailable
- Configuration Invalid
- Dependency Unavailable
- Dependency Mismatch
please see the second article in the series.
Symptoms in this category describe availability problems; Table 3 defines existing availability sub-categories:
Table 3. Availability symptoms sub-categories
|Storage||Problems related to the availability of storage resources:
|I/O||Problems related to the availability of I/O resources:
|Network||Problems related to the availability of network resources:
|Communication||Problems related to the availability of communication resources:
|Hardware||Problems related to the availability of hardware resources:
|Software||Problems related to the availability of software resources:
|Data||Problems related to the availability of data:
Listing 3 shows the XML schema for the availability category.
Listing 3. XML schema for the availability category
<simpleType name="Availability"> <restriction base="QName"> <enumeration value="availability:storage"/> <enumeration value="availability:io"/> <enumeration value="availability:network"/> <enumeration value="availability:communication"/> <enumeration value="availability:hardware"/> <enumeration value="availability:software"/> <enumeration value="availability:data"/> </restriction> </simpleType>
For good examples of canonical symptoms in this category (service availability symptoms):
- Resource Capacity Met
- Resource Unavailable
- Resource Degraded
- Resource Unreachable
- Repeated Availability Problem
please see the second article in this series.
Symptoms in this category describe quality of service problems; Table 4 shows these existing sub-categories:
Table 4. QoS symptoms sub-categories
|Metrics||Problems detected by the analysis of existing metrics associated to a system:
|Performance||Problems detected by the performance analysis of a system:
Listing 4 gives you the XML schema for the QoS category.
Listing 4. XML schema for the QoS category
<simpleType name="QoS"> <restriction base="QName"> <enumeration value="qos:metrics"/> <enumeration value="qos:performance"/> </restriction> </simpleType>
Good examples of QoS symptoms can be derived from the analysis of QoS agreements and monitoring data in networks and applications. Typically, these are threshold-oriented symptoms in which a threshold-met or -surpassed situation usually means that a QoS parameter was violated. Symptoms will reflect these QoS parameter violations and associated resolutions will be taken by the symptom processors.
Symptoms are generally stored in symptom catalogs; as such, they provide a common medium for distribution of symptom information. As well, symptom catalogs may choose to publish their defined symptom categories for reuse purposes.
It is a best practice for a symptoms author to consult and reuse these categories whenever possible. The same also applies to whole symptom definitions -- when possible, reuse should be encouraged.
The following method generally applies for the classification of symptoms in the standard taxonomy or for the expansion of the taxonomy:
- Look at the existing main symptom categories. If the symptom fits in any of the existing main categories, then proceed. Otherwise, a new main category should be created.
- If creating a new main category, it is a best practice to align these categories with standard ITIL-oriented functional processes. For example, a symptom that signals a continuity of service problem would be related to a new category of Continuity of Service Symptoms.
- Search the symptom secondary categories for a fit. If the symptom fits the description of any of the existing secondary categories, then proceed. Otherwise, a new secondary category should be created.
- Secondary categories are also functional, but they may denote organizational aspects of the process. One such example would be the types of QoS metrics that may be evaluated (for example, performance metrics, availability metrics, and so on). When a metric type does not exist in the QoS main category, you can create a new one.
- If the symptom fits in an existing secondary category, look for similar symptoms. If many similar symptoms exist in a secondary category, the symptom author may choose (as an option) to create tertiary (or even deeper) categories and group his symptom, along with other existing similar symptoms, in such derived categories. In this case, a reorganization of existing symptoms may be necessary (in this article, I have not provided analysis of the inherent difficulties associated with the reorganization of symptom categories -- consider it an important task, not to be undertaken lightly).
After symptoms are created and correctly classified they may be imported into symptom catalogs and start being part of the analysis, detection, and resolution process that makes use of symptom definitions in an autonomic manager.
Autonomic computing symptoms provide good value for identification and resolution of situations in an autonomic computing environment, but in order to be more effective, symptoms should be correctly classified so they can be applied to the specific context to which they are related in the overall analysis of IT processes. There are many ways to classify symptoms; in this article, I've laid out a methodology and associated best practices based on functional decomposition of symptoms and the resources they affect.
Whenever possible, reuse of canonical symptoms and their respective taxonomy should be encouraged. A standard starting set of symptom categories exists along with a methodology for their expansion. New categories may and will be added when more and more symptoms are authored in production or pre-production environments. It is important that the philosophy associated to the taxonomy and the subsequent classification of symptoms be followed because only this will guarantee a smooth and efficient processing strategy for realizing the power of symptoms in an autonomic manager.
"Symptoms Deep Dive, Part 1: Know thy symptoms, heal thyself" (developerWorks, October 2005) provides an overview of the autonomic computing symptom format.
"Symptoms Deep Dive, Part 2: Cool things you can do with symptoms" (developerWorks, December 2005) lists a set of canonical symptoms and describes scenarios where these symptoms are used.
The four-part "Automate data collection for problem determination" series (developerWorks, May-November 2005; updated March 2006) delivers a step-by-step guide to using the Automated Problem Determination tool and demonstrates the IT Infrastructure Library (ITIL) service flow for problem management.
"The autonomic computing edge: The role of knowledge in autonomic systems" (developerWorks, September 2005) provides an introduction to symptoms.
In "Meet the experts: Lennart Frantzell" (developerWorks, July 2005), this Senior Technical Consultant at the IBM Innovation Center discusses symptoms databases.
"An Architectural Blueprint for Autonomic Computing" (October 2004) provides an overview of the autonomic computing architecture, including more information about knowledge and other architectural building blocks.
Learn more about autonomic computing hot topics in The autonomic computing edge column on developerWorks.
Learn more about ITIL processes and how IBM can help you by reading "Making ITIL Actionable in an IT service management environment."
For more information on the current symptoms 1.0 format and the Log and Trace Analyzer component, refer to the Autonomic Computing Toolkit.
Stay current with developerWorks technical events and webcasts.
- Dave Bartlett blogs each week on his thoughts about the state of autonomic computing in the industry.
Get products and technologies
- IBM trial software: Build your next development project with trial software, available for download directly from developerWorks.
- developerWorks blogs: Get involved in the developerWorks community.
Marcelo Perazolo is a member of the IBM Autonomic Computing Architecture team, where he serves as an architect for symptoms and other knowledge formats and defines Management Integration Taxonomies related to autonomic computing. He has worked for IBM since 1990, with various assignments in network and systems management. Marcelo received an M.S. degree in Electrical Engineering in 1994. His interests include problem determination and prediction, process optimization techniques, security, correlation technologies, and knowledge representation.