Complexity and high installation and maintenance costs are a reality of today's computer systems. This is mostly due to the fact that they are created to meet the specific needs of their owners and users. Once these systems are composed of several different components -- hardware, middleware (for example, Tomcat, WebSphere Application Server), applications -- and their management costs are higher them the costs of the components themselves.
These issues are addressed by the autonomic computing paradigm. It proposes systems that are environment and workload changes resilients, that heal themselves over failures and errors and proactively defend against attacks, bringing that way the overall management costs significantly down.
In order to achieve the proposed goals autonomic systems can be broken down in four different fronts, as depicted in Figure 1.
Figure 1: Autonomic computing attributes
The envisioned environment used to deploy an autonomic solution would have a feedback control system that is capable of monitoring and collecting data, analyzing the collected data, planning the requisite changes, and effecting the changes on the underlying components of the system.
To make such a system feasible, a common representation of the data flowing inside the system must be used. That is when Common Base Event (CBE) (see Resources), comes into play. CBE provides a consistent way of transferring common information amongst disparate enterprise applications that support logging, management, problem determination, autonomic computing, and e-business functions. Its design offers the following benefits: data is presented in a common and understandable format, common set of messages, and simple standard interface. For a more thorough discussion of this subject please see the Resources section.
Grabbing two pieces of the pie
The featured demo implements two aspects of autonomic computing shown in Figure 1: self-healing and self-optimizing. Figure 2 shows the first aspect.
Figure 2: Self-healing View
Starting with the App server component at the bottom and moving clockwise. The numbers close to the events indicate their order.
- App server -- This element hosts all the running applications. It is also responsible for starting and stopping them (this particular feature can be controlled externally).
- Log file -- This stores events generated by the App server, every time an application either starts or stops. The format used for these entries is the one defined by the WebSphere showlog file.
-
Sensor -- This performs the monitoring of the log file and publishes a
StateChangetopic with the broker, every time a new state change entry is added to the log file. It performs its duties using a component called acadapter (see Resources). -
Broker -- This presents a topic-based pub/sub notification engine. The way it works is that topics are published with the broker. In this scenario the Sensor and the M.A.P.E. Loop are publishers (they publish
StateChangeandRestarttopics, respectively). If a component wishes to receive a notification every time a particular topic is published, it should subscribe with the broker. The broker is responsible for keeping the list of subscribers and making sure they receive the appropriate notification every time a topic is published. -
M.A.P.E. loop -- This is the brain of the system. This acronym stands for: Monitor, Analyze, Plan, and Execute. Subscribes with the broker to receive
StateChangenotifications. Analyzes the data received and decides whether or not to publish aRestarttopic. It is based upon the ABLE framework (see Resources). -
Effector -- This element interacts with the App server to restore the fault application back to its running state. Subscribes to receive restart notifications and for each one received sends a
Restartmessage to the App server.
This design supports the autonomic computing self-healing capability. Every time a stopped application is detected the system is able to take the necessary steps to get it to run again.
Figure 3 depicts the self-optimizing aspect.
Figure 3: Self-optimizing view
In the Figure 3 diagram the AppNotBusy flow is depicted in green and the AppTooSlow one in red. As in the previous diagram the numbers indicate the order of the events. The requests created by the stress generator can be present at any time.
The above mentioned components maintain their main behavior described previously with the following differences:
- App server -- responsible for removing and adding applications according to the received commands. It also monitors how busy the hosted applications are and issues an
AppNotBusybased on the number of requests sent by the stress generator element. -
Broker -- in this scenario the App server, the EPP, and the M.A.P.E. Loop are publishers (they publish
AppNotBusy,AppTooSlow,Add App, andRemove Apptopics, respectively). -
M.A.P.E. Loop -- subscribes with the broker to receive
AppNotBusyandAppTooSlownotifications. Analyze the data received and decides whether or not to publish aAdd ApporRemove Apptopics respectively. -
Effector -- interacts with the App server to add or remove applications. Subscribes to receive
Add AppandRemove Appnotifications and for each one received sends either anAdd ApporRemove Appmessage to the App Server. - Stress Generator -- simulates load for the applications hosted by the App server.
-
End-to-End Problem Platform (EPP) -- measures the response time of a request that it sent to the App server. It generates
AppTooSlownotifications in case the response time is above a preset threshold (1 second).
This design supports the autonomic computing self-optimizing capability, therefore response time is continuously measured, (when EPP is running), and applications are created or destroyed in order to maintain response time according to pre-established parameters.
This demo is available as part of the IBM Emerging Technologies Toolkit (ETTK). (See Resources.) Once ETTK is installed and configured, the following steps get this application up and running:
- Start the chosen App server.
- Point your Web browser to http://host:port/actk/appHO/client/demo.html, replacing host and port with the values used when configuring the toolkit.
The demo is composed of a Web page with two frames, each one holding a Java Applet. The left Applet has a panel split in two pieces. The upper area has the Application Table that contains all the applications hosted by the App server. Each row represents an application and contains a stop button to allow for direct user interaction. The lower part presents a text box used to display the main events generated by the system, in a simple textual format.
One button, Clear history, and one check box, Randomly stop applications, are available to clear the text box and to put the demo in a loop mode which causes a random selection of one of the running applications to stop every 10 seconds, respectively. The remaining controls will be addressed during the self-healing discussion below.
The right Applet is called Event Viewer. It has two panels: Image Panel and Text Panel. The Image Panel shows the elements displayed in the diagram of Figure 2 as well as the flow of events. The Text Panel contents is discussed below. On the lower area a set of buttons are available: Clear Display, Set Delay ..., Record ..., Playback ... and Delete .... Three of them (Record, Playback and Delete) enable the creation, execution, and deletion of a file containing the captured events. Clear Display removes all the event flow displayed and all the information contained in the Text Panel as well. Set Delay sets the display refreshing interval (its default value is 2 seconds).
Exploring self-healing behavior
When the demo starts the Application Table is populated with appId 0 in the running state (green light), and idle -- the requests slot indicator, the little icon beside the light, is empty. Each application handles up to three simultaneous requests in this simulation.
If the corresponding stop button is then pressed a StateChange notification is generated and a Restart action is invoked to get the application back to the running state, as discussed previously. Similar behavior is triggered when the Start loop button is selected. The only difference is due to the fact that the application is then stopped automatically every 10 seconds. Figure 4 shows the described flow of events.
Figure 4: StateChange detection and Restart action
Exploring self-optimizing behavior
The controls related to this behavior are the ones on the lower half of the left Applet: the Enable EPP Performance monitor check box, the Number of Clients slider, and the Average client response time indicator. EPP measures the response time of a request that it sent to the App server and if this time is over 1 second, (displayed in red), due to the fact that the Number of Clients slider is moved to the right, an App Too Slow notification is generated and a corresponding Add App action is then sent to the App server in order to request the creation of additional applications to handle the load. Figure 5 shows this scenario.
Figure 5: App Too Slow detection and Add Application action
When the load is decremented and the applications that were added become idle, as perceived by the requests slot indicator and also by the Average client response time indicator (now changed back to black) indicating values lesser or equal 1 second, the App Server senses this scenario and sends a AppNotBusy notification to the broker. As consequence a Remove app action is finally sent back to the App Server to remove the idle application. App id 0 is never removed, even if no load is present. This mechanism is illustrated by Figure 6.
Figure 6: App Not Busy detection and Remove action
At any time, if the Event Viewer - Text Panel tab is selected a list of all the exchanged CBE messages is displayed in an expandable tree format as shown in Figure 7.
Figure 7: CBEs - Event viewer text panel
It is certain that the complexity and high costs associated with computer systems will only increase with time. Autonomic computing seems to be a feasible solution to these problems. It addresses them at their very core, transferring the responsibility of handling configuration, optimization and protection to the computer system, where it belongs, not to mention the long desired capability of having self-aware systems.
The featured demo explores two key aspects of Autonomic Computing: self-healing and self-optimizing in a visual and practical way. It demonstrates that all the needed pieces to implement an Autonomic computing solution are already available and this technology can be a reality today.
- Read about IBM Research's activities in Autonomic Computing.
- A tool to help build autonomic systems, Agent Building and Learning Environment, is also available from the IBM alphaWorks site.




