Autonomic computing architecture is fast becoming an area of focus for systems professionals. Many see the creation of standards for systems that are self-configuring, self-healing, self-optimizing and self-protecting as a critical factor to achieving success.
It is worth noting that an autonomic computing architecture does not promote the idea that systems will never require human intervention. The model anticipates there will always be times when unique challenges are posed requiring the analysis and talents of highly qualified experts.
The goal of an autonomic computing architecture is to limit hands-on intervention to extraordinary situations. Most administrative functions should be carried out according to pre-defined policies. The autonomic computing architecture is not a technological wonderland, but a continuum on which different technologies, organizations, and practitioners find themselves at different times.
The road map for the autonomic computing architecture describes the following five levels of maturity, illustrating how businesses are constantly evolving their IT environment:
Basic --> Managed --> Predictive --> Adaptive --> Autonomic
These terms can be defined as:
- Basic: The product and environment expertise resides in human minds, requiring consultation on even mundane procedures.
- Managed: Scripting and logging tools automate routine execution and reporting. Individual specialists review information gathered by the tools to make plans and decisions.
- Predictive: Early warning flags are raised as preset thresholds are tripped. The knowledge base recommends appropriate actions. The proposed resolution of events is leveraged by a centralized storage of common occurrences and experience.
- Adaptive: Building on the predictive capabilities, the adaptive system takes action itself based on the situation.
- Autonomic: Policy drives system activities such as allocation of resources within a prioritization framework.
While examples of predictive and even adaptive systems exist today, the general state of the industry remains at the basic and managed levels. This slows IT reaction times and leads to considerable overhead, duplication of effort, and missed opportunities. At the same time, organizations are looking for extraordinary increases in productivity and contribution from IT. The autonomic computing architecture is poised to solve these problems now and in the future.
Adapting to an evolutionary process
The key to understanding autonomic maturity levels is the recognition that this is an evolutionary process. There is no instantaneous approach to making systems self-optimizing, self-protecting, self-configuring and self-healing. At the basic and managed maturity levels, applications are not aware of the state of their environment. Problem prevention is all too rare.
To allow system components, whether infrastructure or application software, to predict when they are on the verge of violating a threshold, sensors must be defined and rolled out. IBM has been building this capability into many of its offerings in the past few years. With sensors and instrumentation to collect that data, the next level of maturity is to build on the information. As autonomic computing systems take shape, you will see systems and devices not only being monitored, but automatically reporting that a defined threshold has been approached, reached, or exceeded.
For example, today the Self-Monitoring, Analyzing & Reporting Tools (SMART) built into most hard drives can be used to predict a failure event for a particular disk. Notification of a failure indicator allows appropriate human intervention. A problem can be predicted and forestalled rather than requiring a reaction or after-the-fact fix.
The adaptive maturity level of autonomic computing systems calls for the creation of a process that monitors resources, defines a systematic reaction to a set of indicators, and the automation of procedures that alleviate the underlying problem. For example, RAID arrays can be configured to automatically mirror a failed drive to a spare drive.
The automatic aspect of the response is one guideline for determining whether a given system has matured to the predictive level or gone beyond it to become adaptive.
The threshold levels defined as part of predictive systems management are not discarded as the environment matures. These metrics are captured and analyzed, with the resulting action expressed as recommendations and options. As the organization gains confidence in the ability of the system to monitor and flag events, response levels are defined, and first line of defense activities are enacted by the system. In this way, the services can be considered to be maturing to the adaptive level of an autonomic computing system.
Aligning systems to business requirements -- autonomically
Adaptive and autonomic levels of systems maturity require IT infrastructure to be widely available and integrated, to allow smart components to recognize when an impending failure is going to affect them, or where an increase in demand puts their service level agreement at risk. Adaptive components can take advantage of alternate infrastructure to ensure uninterrupted operation or uncompromised performance.
Adaptive tools are just now being defined. One example is The Adaptive Replacement Cache. This embedded adaptive capability allows a system to become adaptively autonomic in response to performance. Storage systems, databases, processors, file systems, middleware, and Web servers all use caching. IBM researchers at the Almaden Research Center developed a dynamically adapting caching strategy that optimizes a server's performance without having to pre-tune it for a specific workload.
Their strategy involves balancing the least recently used (LRU) and least frequently used (LFU) pages of a cache. It has proven to optimize tested servers significantly over reliance on either of the other strategies alone. This is just one example of how adaptive computing, like all autonomic computing concepts, is evolving.
With the predictive and adaptive tools in place, it becomes possible to set policy for dynamically responding to situations as they arise. For example, where a key line of business servers requires more resources, it might prove better to temporarily pull resources from one task to assist a higher priority task than impinge on the service levels defined for that server.
It is the capability to define policies such as when to take and give that enables the automatic allocation of resources according to their relative priorities. The employee relations intranet server might be deemed slightly less important than the online ordering system used by the company's biggest customer for example. In case of a spike in usage, processor, memory and bandwidth might be diverted from the intranet server to the transaction system until demand slows.
This loop of sensor and changes effected in keeping with pre-defined policies sets the stage for systems to learn. This aspect of autonomic maturity -- infrastructure that can learn as well as predict and adapt -- might seem futuristic. But, the groundwork for this level of unattended systems operation is being laid with autonomic computing systems.
Clearly, the autonomic computing architecture offers a great deal of potential. As always, with a visionary approach to technology, it can be difficult to picture tangible forms these products might take. From a software standpoint, it is most instructive to compare the problems you currently face with potential solutions built under the autonomic computing model. To assist in the process of understanding where in the model any given resource falls, take a look at some practical examples of how the levels can be defined.
- Problem management
- System availability
- Security
- User administration
- Solution deployment
- Performance and capacity management
Currently, there are autonomic tools and techniques being developed that can affect administration and coding practices in each of these operational areas. The following is an evaluation of how the various autonomic maturity levels might affect them.
Problem management is one of the promising areas for immediate application of autonomic computing tools. Everyone has experienced the basic challenge of technical support for some aspect of computing and can relate to the cost of lost productivity resulting from a systems failure. Managing this process has yielded solid technical knowledge bases, as well as creating levels of technical support staffed by people with appropriate expertise.
As instrumentation is incorporated into more systems, predictive algorithms can be used to identify when pieces of technology are at risk of failure. These sensors provide the basis for creating software that is much better positioned to identify chronic symptoms that can lead to problems. Predictive Failure Analysis is built into IBM products such as DB2® and TotalStorage® solutions.
Adaptive and autonomic computing systems address problems before they arise, allocating resources as needed, and balancing their requirements within the negotiated service level agreements of other components.
Seven by twenty-four support is a business reality like never before. Nonstop is no longer just a brand name, it is an expectation of IT as the engine that drives business communication and information. Problems that jeopardize mission critical systems can lead to catastrophic consequences such as unemployment.
As I have touched on elsewhere in this article, as computing infrastructure matures along autonomic computing lines, the ability to predict and adapt can ensure that the most important systems are up and running. Policies allow autonomically managed computing elements to negotiate with each other, discover their relative priorities, and divide up the available resources accordingly.
Security and user administration
User Administration and application security are related issues when it comes to autonomic computing maturity levels. Password rotation and individual permission profiles can be a time-consuming and frequently frustrating experience for corporate and personal users of shared computing services.
Single sign-on (SSO) is one approach to managing users with a host of IDs and passwords. LDAP, Active Directory, and storage authentication schemes like NIS+ provide tools for certain management goals, but clearly application and system security is not at a level where it can be taken as a given.
New tools, such as the Integrated Solutions Console, are being developed by IBM and released as part of the support for the autonomic computing architecture. The Integrated Solutions Console enables highly granular access permissions within common administration tools for different applications, servers, and devices within an organization. This is just one benefit that user administration gains from the push to autonomic maturity.
Basic systems security, such as like closing unused ports and changing default passwords, has given way to the introduction of automated tools for evaluating access patterns and automating system responses. One example of this is the Tivoli® Risk Manager, which can detect unauthorized access attempts and shut down services in keeping with predefined security policies.
As the value of information and the technology to manage it grows, techniques for maintaining secure systems have kept pace. The autonomic maturity levels can be used as a yardstick to determine where any given system falls in its capabilities to manage, predict, and adapt to security threats autonomically.
When installing software today, you are forced to inventory and evaluate not only the infrastructure on which you intend to execute the application, but other packages to identify conflicts that might occur. The cost of not understanding these factors in advance is having to rollback the installation in a tedious trial and error approach. This basic approach to software installation varies from one application to the next, each with its own set of scripts and prerequisites, documentation, and media options.
On the maturity scale, a more managed approach encompasses installation executables that probe for the existence of prerequisites and log the lack of their availability before failing. Too often, this notification occurs towards the end of a laborious install process, with other applications brought down, and availability limited for users in the event that a system restart is required. A predictive software deployment checks the environment before running, and extends that check to identify potential conflicts between other applications.
An adaptive approach to software deployment takes all of that into account, but performs the installation functions if permitted. Instead of expecting to have dependencies installed prior to executing, the adaptive software deployment process identifies the requirements and fulfills them. Part of the vision for this level of maturity involves acquiring needed licenses and automating the purchasing function. This leads to a definition of truly autonomic software deployment.
Performance and capacity management
Performance and capacity management has a similar profile to problem management and procedures to administer changes within large data centers. Much of it is done on the fly or as needed, and some organizations have a more structured set of procedures for dealing with these issues than others. However, it is in this area that you can see how the IBM on demand model (and grid computing, in particular) intersects neatly with autonomic computing systems.
For systems to be self-configuring and self-optimizing, they require self-knowledge about their requirements, about the infrastructure available to them, and awareness about their priority relative to other consumers of those resources. Monitoring utilization and predicting when thresholds will be exceeded is a level of performance and capacity management that benefits any number of customers today.
Beyond this, however, lies the adaptive nature of an autonomic computing system. As processors, for example, are installed in servers, the on demand business model allows customers to activate them when and as needed. With grid computing, for example, the resources of distributed computers and systems within an organization (or across organizations) are used and viewed as if they were one large, virtual computing system. The challenge to this model is predicting the requirement, then adapting quickly to maintain the defined levels of service for a given application.
As autonomic computing technology matures, this is a key area where programmers and administrators can collaborate. Components can be written to execute within certain performance parameters, and as their state changes, requests for more resources, or relinquishing resources in favor of others, become part of the application. It is through these kinds of mechanisms that autonomic computing systems will be implemented.
Like any form of maturation, autonomic computing technology has levels that will take time to reach. As the model is better understood, any given system can be evaluated according to the characteristics that define each level from basic to truly autonomic.
The overarching theme for autonomic computing technology is to get better value for customers investing in IT resources. Clearly, this means doing more routine tasks automatically, and the way to accomplish that is to get the systems doing more work for us. Autonomic maturity is about starting at whatever maturity level you are and incorporating tools to move to the next level.
There has long been concern expressed in science fiction about the rise of thinking machines. Some of the more ambitious visions for autonomic computing, like software agents that learn and adapt independently, might be reminiscent of those stories. The reality of today's business world is that the level of resource utilization, coordination between systems, and return on computing investment is not sufficient.
By applying autonomic computing tools and techniques at every level of your computing infrastructure, you can deliver better value. The greater risk is one of disparate, unintegrated systems, expensive to maintain, and hard to change. Autonomic computing technology represents a way to alleviate practical problems today while laying the foundation for the integrated systems architecture for tomorrow.
- Download the Autonomic Computing Toolkit, which includes Solution Installation and Deployment Technology, Integrated Solutions Console, Generic Log Adapter and Log and Trace Analyzer, Autonomic Management Engine, and Resource Model Builder tools.
- Download the Business Workload Manager Prototype, which includes the Application Response Measurement API.
- Download the Agent Building and Learning Environment 2.0 for building intelligent agents.
- Download the Emerging Technologies Toolkit for a look at upcoming tools and technologies.
- Check out more emerging tools on alphaWorks.

Daniel Worden, a Studio B author, has 20 years of experience in systems, database administration, and operational management. He is the author of five technology books, whose work has been translated into several languages. As IBM SanFrancisco partner number 46, his firm received the FastStart award in 1998 for their Java application built with the SF framework. Over the past two years he has led the design and development of a Food Trading and Tracking utility with WebSphere, DB2, and Domino. Daniel is currently working on Storage Networks -- From the Ground UP for Apress. He can be reached at dworden@worden.net.
