Autonomic computing is a key focus area in the IBM® On Demand business model. In IBM DB2® Universal Database™ (DB2) V8.1, a host of rich functional tooling and monitors were implemented in DB2 such that the DB2 database truly began to monitor its own health status - independent of the database administrator (DBA). DB2 V8.2 enriches and extends the current autonomic functionality in DB2 with a redesigned toolset, better monitoring, new health data retrieval interfaces, and more.
We have written this article with several groups of users in mind: those that are familiar with the DB2 Health Monitor Center and would benefit from a preview of some of the new features associated with it in DB2 V8.2, those who have not yet had a chance to try it out and may be looking for some compelling reasons to migrate to DB2 V8.1 as the DB2 V7.2 end-of-service date nears, and those who are interested in learning how autonomic capabilities that IBM has been developing have been implemented in a real product. Whatever your interest, this article will serve its purpose if we can give you a better feel for what is in DB2 today, what is coming, and our general philosophy with respect to autonomic computing.
DB2 and the Autonomic Computing Initiative
It isn’t the goal of this article to cover all the details of what autonomic computing is; rather, we aim to cover the new autonomic features that are part of the DB2 V8.2 Health Monitor, and some features that are already there. However, in this section, we’ll spend some time talking about autonomic computing in general to give you a quick background or refresher. An understanding of what autonomic computing is about is key to understanding the usefulness of new and existing features. For example, we are not trying to replace the Database Administrator (DBA). (Aside from being a true statement, we hope that will put some DBAs at ease so they will start to experiment with these new features).
The easiest way to conceptualize autonomic computing is to think of the human body’s nervous system and how it monitors and regulates its own environment. We like to exercise a lot, and it just so happens that the human body and exercise illustrate the concepts of autonomic computing very well.
When we go for a run, we usually start with a brisk walk. At this point our bodies start to sense a more strenuous motion and a request is made that more oxygen be delivered to our muscles; of course, this happens through an increase in blood flow. As we start into a light jog, our bodies start to sweat to alleviate the increase in body temperature for the job; this is our bodies sensing an even more strenuous environment and adjusting its parameters (rate of breathing, core body temperature, and so on) in consideration of it. Finally, as we are running full out, our lungs are taking in oxygen as fast as they can to feed this whole process. After a while, one of us quits running; however this has nothing to do with autonomics (perhaps common sense, but not autonomic computing).
The message in the previous example is that neither of us told our bodies (or each other for that matter) to start breathing heavier, or to start sweating, it just happened in response to a dynamic environment. So now that we know our bodies are autonomic (and we’re happy about that), now imagine how happy you would be if these characteristics were part of a database. Want to know more about autonomic computing in general? Check out the articles listed in the Resources section below.
For example, if there were an excessive amount of sorting such that the sort heap spilled to disk, performance would suffer. Now imagine if the database were smart enough to sense that a large percentage of sorts were spilling to disk and paged the DBA to inform them of the problem. Better yet, if even more of the heap were persisting to disk, it could launch a script (set up by the DBA) to dynamically change the amount of memory allocated to the heap, or temporarily steal memory from another under-active heap (a new feature in DB2 V8.2 ). That’s autonomic computing!
With the autonomic computing initiative, IBM’s goal is to provide technology that increases the effectiveness of human intervention in computer systems by reducing the number and complexity of those interventions through the use of automation, intelligent advice, and learning. This results in an overall reduction in the total cost of ownership (TCO) and increased business productivity.
Figure 1 shows the evolution of autonomic computing:
Figure 1. The evolution of autonomic computing
Any computer resource, whether it is an individual software application, a network, or an entire system moves through a progression from manual control to autonomic management. To learn more about the autonomic maturity levels and what traits characterize each level, see the Resources section.
Today, DB2 autonomics could be placed in the Predictive (Level 3) stage of the epochs of automation. The DB2 Health Monitor monitors the system and recommends fixes. When an exception is discovered, DB2 won’t automatically do anything unless you script it (which would make DB2 behave like a Level 4 autonomic system).
The goal for DB2 is to enable DBAs to specify constraints and goals for the system through a set of defined polices that emulate the business needs and the IT infrastructure – ultimately creating a sensing and reacting DB2 system.
One important aspect to understand is that autonomic computing is not an initiative to replace the DBA. Quite simply, it isn’t going to happen. What we are trying to do with our innovative technology is change the methodology whereby a DBA continually hunts for information to where DB2 monitors itself and lets you know when an issue arises. Quite frankly, DBAs have too much on their plates already, and when you look at the data persistence requirements of the future, there is no way they’re going to be able to keep up. With the ever-increasing amount data being stored, analyzed, and metricized, DBAs have too much to do in a day already. DBAs need to focus on enablement, design, modeling, and so on – we’re enabling them to do this with DB2.
Back to our running example, we didn’t keep checking our body temperature or our feet for blisters while we were out. If either of these health alerts were breached, our bodies would have let us know in one way or another.
So, in essence, DB2 reverses the system health diagnosis model from a DBA hunting for potential and existing problems by running different monitors at different times and then analyzing huge amounts of data looking for indications of unhealthiness (without any sort of consistency due to a varying degree of skill level and intuition) to DB2 monitoring itself for healthiness and notifying select personnel only when potential or existing unhealthy conditions are encountered.
The DB2 management-by-exception model is implemented by the health monitor in DB2. It removes the burden of DBAs needing to know how to set up their own monitoring environment, determining what to monitor, what the data means, how the data is related, and so on. The health monitor knows what to monitor, when to monitor, how to evaluate, how to solve the problem, and what possible actions to take. When an exception occurs, the DBA is notified (through an SMTP server, email, page, entries in the ADMIN notification log, a NET SEND message, and so on) and instructed on how to proceed. The DBA only needs to install DB2 UDB V8 and can go on to other important activities until notified by DB2 UDB. All this helps to reduce TCO.
Health indicators in DB2
Health indicators are the building blocks of the health monitor. Each indicator maps to a health aspect of the database for a particular object (for example, an instance, database, table space, and so on).
In DB2 V8.2, the object levels that DB2 monitors have been extended to four different object levels (shown in Figure 2) through three different types of health indicators.
Figure 2. Object levels monitored by DB2
The three health indicators in DB2 V8.2 include:
- State-based indicators: These indicators represent two or more states. One is normal, and all others are considered non-normal. These indicators have always been a part of DB2 V8.
- Collection state-based indicators: These indicators are new to DB2 V8.2 and generate attention alerts which are really information conditions indicating a non-normal state. They represent the aggregate state of a set of database objects, one of which is normal and all others are considered non-normal. Examples of these types of indicators include the notification that particular tables need to be reorganized, or that statistics are out of date, or that federated nicknames are invalid.
Instead of generating alerts for all affected objects, DB2 will just give you one alert and store a separate list for the objects under that alert (thereby reducing memory consumption).
- Threshold-based indicators: These indicators are based on a formula that evaluates some statistic of a particular object. Warning and alarm threshold values set boundaries on the resultants of these formulas. A warning alert indicates a non-critical condition that might indicate a non-optimal system, while an alarm alert is a critical condition requiring immediate action. Of course, you can define these values, or use the default values that come as part of a DB2 installation.
Most indicators in DB2 V8.2 are threshold-based and are associated with upper and lower bounds that relate to the health of a system as shown in Figure 3:
Figure 3. Threshold-based indicators
For example, log utilization rate is an example of an upper bounded threshold indicator. When it increases, these types of indicators go from a normal, to a warning, to an alarm state. Lower bounded indicators are identified as unhealthy as they decrease in value (for example, the cache hit ratio indicator).
DB2 V8.2 includes some new health category indicators (listed below). In all, there are 18 threshold-based, 6 state-based, and 4 collection state-based indicators grouped into 11 different categories in DB2 V8.2. (All of these health indicators are fully documented in the DB2 System Monitor Guide & Reference).
- Application concurrency (for example, deadlock rate)
- DBMS (for example, instance operational state)
- Database (for example, database operational state)
- Logging (for example, log utilization)
- Memory (for example, monitor heap utilization)
- Package and catalog caches, and workspaces (for example, package cache hit ratio)
- Sorting (for example, percentage of sorts that overflowed)
- Table space storage (for example, table space utilization)
- Database maintenance (for example, identification that a database backup is required; this category is new to DB2 V8.2)
- High Availability Disaster Recovery (for example, HADR log delay; this category is new to DB2 V8.2)
- Federated (for example, the nickname status; this category is new to DB2 V8.2).
You can configure object-level defaults or specific object health indicator settings through the Health Center, the CLP, or a C-based API. For example, using either of these methods you can configure:
- Thresholds (warning and alarm levels)
- Sensitivity (the minimum time required for the value to be in an “alertable" state before generating a warning, alarm, or attention alert; with this setting, temporary spikes in usage will not cause an alert to be generated. This is a new parameter in DB2 V8.2.)
- Whether the indicator should be evaluated or ignored
- Actions to take upon a warning, alarm, or attention alert. These actions can include tasks and DB2 or operating system-based scripts.
- Whether or not a set script or task should be run when a warning, alarm, or attention alert is generated.
In the development of DB2 V8.2, the Health Center configuration interface has been tested for usability, and has been revamped to better group related health objects so you can more easily work with their settings (see Figure 4):
Figure 4. Health Center launchpad
DB2 V8.2 DBAs can use the Health Indicator configuration launchpad to drill down and view or change any health indicator settings. For example:
Figure 5. Health indicator settings
Configuring notifications can be done through the Health Center, a CLP, or a C-based API as well. In DB2 V8.2, a notification troubleshooter was added to ensure that notifications are set up and work correctly before an alert is generated.
How DB2 collects health-related information
DB2 collects health-related information at a non-configurable pre-defined interval (although you can configure how often the information is updated in the Health Center). When an interval level is hit, the health monitor uses basic snapshot monitors (you don’t have to turn anything on), native operating system APIs for filesystem-related indicators, statistics stored in tables, and so on to evaluate the health of the system. Once this data is retrieved, the collected metric is then compared to the health configuration settings.
When a state-based indicator is being analyzed, and the result is not normal, an attention alert is generated and any configured action is run. If it is a threshold-based alert that is breached, then the appropriate Warning or Alarm actions and/or alerts are fired.
Once an interval level is surpassed, the health monitor goes to “sleep" and wakes up when the next interval is reached. If an alert that was already identified in the previous interval still exists by the next interval, DB2 won’t alert you or perform any scripted actions against it. If a threshold moved from a Warning to an Alarm state, it will correctly execute the appropriate actions in the Alarm configuration. Likewise, if an alert was reduced from an Alarm back to a Warning, those actions would also be executed. Finally, if an interval returned a value such that both the Warning and Alarm thresholds were breached, only the Alarm threshold actions would be triggered.
Figure 6 shows the new DB2 V8.2 interface used to set up actions that result from a breached threshold. Note the new option that allows you to define a set of actions that can be triggered by a breach of a Warning or an Alarm threshold, as well as the new interface.
Figure 6. Set up actions that result from a breached threshold
A set of health-related data for all indicators is recorded by DB2. The typical health information set includes:
- The recorded value
- An evaluation timestamp
- The alert state
- The formula used to determine the recorded value and ultimately the health of the monitored object
- Additional information about the indicator (what DB2 considers healthy, and so on)
- Historical records for any previously generated alerts.
An example of the indicators for the db2.mon_heap_util indicator is shown in Figure 7:
Figure 7. Indicators for monitor heap utilization
To support the new collection state-based indicators, a new health data set was added to DB2 V8.2 that includes collection-specific data like the object name, its state, a timestamp for the alert, and details.
DB2 V8.2 is smart with respect to how it groups alerts too. Instance, database, and table space alerts are displayed using a “highest severity rollup" algorithm that shows the highest existing severity alert for its health indicators, or the health indicators of its child objects.
Accessing health data
There are multiple interfaces that can be used to view and respond to alert or health information. Specifically, you can use the:
- Health Center
- Web Health Center
- Command Line Processor
- SQL-based functions
- C APIs
Let's look at each of these.
The Health Center
The Health Center (shown in Figure 8) is a graphical interface that provides a front end to collected DB2 health information. The Health Center shows the overall state of the database environment and all of its current alerts. The Health Center is a management-by-exception tool, so only health indicators in an alert state are viewable (it doesn’t make a lot of sense to grab the valuable attention of a DBA for something that isn’t a problem).
Figure 8. The Health Center
In DB2 V8.1.3, the Health Center was enhanced such that an attach to all the monitored instances wasn’t forced when the tool was started; rather, the attach was changed to occur when the instance health information is drilled down through the tree view in the left-hand pane. Also note the red, yellow, and green filters that you can use to remove objects without issues (these were new in DB2 V8.1.2).
In DB2 V8.2, this center has undergone extensive usability testing and was rewritten with a new underlying interface and smarter technology to guide you through the most appropriate resolution (more on these new features in a bit).
One key enhancement is the introduction of the Recommendation Advisor. Before DB2 V8.2, recommendations for corrective actions had no “weighting" (in that one course of action wasn’t recommended as a better option than another), and was part of the details for a specific alert as shown in Figure 9:
Figure 9. Recommendations for corrective actions before V8.2
You can see the dilemma a new DBA could face when responding to a health alert in this situation. Should the DBA increase the short heap or tune the workload? Which would yield better results?
In DB2 V8.2 the new Recommendation Advisor takes a DBA through a series of questions and preferences so that it can recommend one course of action over another. The introduction panel summarizes the details of the generated alert. You can see in Figure 10 below that the details and information for the attention, alarm, or warning are presented to the DBA, with an option to view historical exceptions.
Figure 10. Recommendation Advisor
The requirements panel asks the DBA a series of question related to the generated alert. (We edited Figure 11 below so you could see all the questions that are related to a spilled sort alert.)
Figure 11. Example of a spilled sort alert
Taking into account the DBA's responses to the questions asked by this advisor, the Recommendation Advisor will suggest a course of actions to help resolve this issue.
As you can see in the Figure 12, in DB2 V8.2, DBAs can get recommendations that are ranked in terms of a course of actions that will solve the exception, considering their preferences to such characteristics as availability, or past tuning operations (like running the Design Advisor), and so on.
Figure 12. Ranked recommendations
The DBA now selects a course of action from the recommendations panel and works to resolve the problem. For example, Figure 13 shows the Health Center recommending a new SORTHEAP value (note that details about the recommendation are available as well).
Figure 13. Example Health Center recommendation
Selecting the option to tune the workload would result in a different course of action; in this case, it would launch the Design Advisor. (This is actually one of the recommendations that would be revealed if the DBA hit the Show All Recommendations button. It wasn’t shown in our example since we’ve indicated in Question 1 that we had already used this tool.)
Figure 14. Launch the Design Advisor
For those administrators that still believe the command line is as much GUI as they’re willing to work with (you know who you are), all the features of the Recommendation Advisor are exposed via the DB2 Command Line Processor (DB2 CLP) as shown below:
Figure 15. Viewing recommendations from the DB2 CLP
The information retrieved by the CLP is the same as the information exposed in the Health Center. In fact, recommendations are built on the server via a recommendation engine in DB2 V8.2, and that information is stored in an XML document. When accessing this information from the Health Center, DB2 is simply parsing the XML document (the same document used by the CLP) and displaying it graphically.
The Web Health Center
There is also a version of the Health Center that is provided as part of the DB2 Web tools. You can think of this as a “lite" version of the Health Center. It’s intended for users who would normally use the full featured Health Center, but are currently away from their usual point of access. It is available on any device or median that supports an Internet browser, including personal digital assistants (PDAs), cellular phones, and so on. The Web Health Center provides the capability to view alerts for an instance and its database objects, view details and recommendations for alerts, and links to the Web Command Editor where a DBA can remotely enter CLP-based commands.
Figure 16. Web Health Center
The Command Line Processor
Health-based information can also be retrieved through the DB2 CLP using the GET HEALTH SNAPSHOT command. The GET HEALTH SNAPSHOT command is similar to the GET SNAPSHOT command and can be used globally, for specific database partitions. The syntax for this command is shown in Figure 17:
Figure 17. Syntax for GET HEALTH SNAPSNOT
Many experienced DBAs use this method to retrieve health-related information because it is fast and easy. It gives you the same data as the Health Center, but you have to know the syntax to get it.
DB2 V8.2 adds a new keyword to this command. The new WITH FULL COLLECTION option returns information on all the objects that DB2 is monitoring for a collection health indicator, regardless of their health state. (Without this option, DB2 would only return collection objects in attention or automate failed states.) This option is only useful for a database level health snapshot since that’s the only level at which collection health indicators are defined.
A sample of the output from this command is shown in Figure 18:
Figure 18. Sample output from GET HEALTH SNAPSHOT using the WITH FULL COLLECTION option
SQL table functions
You can also use SQL table functions to access health-based DB2 information. SQL table functions were first introduced in DB2 V8.1 and were extended for health monitoring in DB2 V8.1.2. Basically these functions allow you to retrieve snapshot information using SQL, and display the results in a table format. This is beneficial in that DBAs don’t have to retrieve this information programmatically using a C-based API, or a JNI wrapper.
The three provided table functions can return health-related information for all four monitored object types. These functions are grouped into the following categories:
- HEALTH_<object type>_INFO: header information for health snapshot at the specified object type; includes highest severity alert state
- HEALTH_<object type>_HI: all the health indicators, including formula
- HEALTH_<object type>_HI_HIS: all the history records for all health indicators listed in table function above
where <object type> can be one of DBM (database manager), DB (database), TBS (table spaces), or CONT (containers).
The following example shows how to select health indicator information for db.spilled_sorts for the SAMPLE database as a global snapshot:
SELECT * from table ( HEALTH_DB_HI ( ‘SAMPLE’, -2 ) ) as HEALTH_DB_HI WHERE HI_ID = 1003
DB2 “V8.2" adds the following new functions:
- HEALTH_DB_HIC, HEALTH_DB_HIC_HIS … list collection objects and collection object histories
- HEALTH_HI_REC … get recommendations as a blob
HEALTH_HI_REC(IN SCHEMA_VERSION, IN INDICATOR_ID, IN DBNAME, IN OBJTYPE, IN OBJNAME, IN DBPARTITIONNUM, IN CLIENT_LOCALE, OUT REC_DOCUMENT)
The following example shows how to get the recommendations for db.spilled_sorts in English for the SAMPLE database on all partitions:
CALL HEALTH_HI_REC (8010600, 1003, ‘SAMPLE’, 2, CAST (NULL as VARCHAR), -2, ‘En_US’, ?)
Refer to the DB2 V8.2 documentation for more information.
C-based Application Programming Interfaces
Finally, you can use the db2GetSnapshot C API with the SQLM_CLASS_HEALTH or SQLM_CLASS_HEALTH_WITH_DETAIL classes to get this information programmatically.
Just let it run
We couldn’t possibly do justice to the autonomic capabilities of DB2 in this article. We hope that we were able to illustrate some of the powerful features that you can use in DB2 today, and give you just a taste of some of the new features in DB2 V8.2.
One question we get asked a lot with all this monitoring going on is “What about my performance?" Our design goal was that monitoring could never impact your workload by more than 1% -- and we’ve pretty much stuck to it. In fact, when key autonomic features would exceed that target range, they are not implemented by default, and a notification is returned to the DBA about potential performance degradation when such a feature is turned on (for example, the DB2 Activity Monitor in DB2 V8.2 – but that is a different article). If the 1% notification/performance tradeoff isn’t a winning one for you, you can turn the monitoring off.
The truth is, no matter how experienced or novice of a DBA you are, how big or small your shop is, you can benefit from the autonomic capabilities in DB2.
- The "Roadmap to autonomic computing" (developerWorks, February 2004) helps you to begin integrating autonomic computing concepts into your products.
- The tutorial "Take a quick tour of autonomic computing" (developerWorks, April 2004) explains the concepts behind autonomic computing and looks at the tools at your disposal for making it happen.
- The article "Understand autonomic maturity levels" (developerWorks, February 2004) details the levels of maturity of the autonomic computing model.
- "A look at the new functions in DB2 Universal Database V8.2"
- The DB2 System Monitor Guide & Reference is a resource for understanding how to monitor DB2 UDB.
- Check out developerWorks Autonomic computing for more information and resources on autonomic computing.