Problem Determination Log/Trace Scenario Guide

Autonomic Computing Toolkit
Problem Determination Log/Trace Scenario Guide

Document Number SC30-4080-02

Note

Before using this information and the product it supports, read the general information in Appendix D, Notices.

Second Edition (August 2004)

This edition applies to Release 2.0 of the Autonomic Computing Toolkit and to all subsequent releases and modifications until otherwise indicated in new drafts.

(C) Copyright International Business Machines Corporation 2004. All rights reserved.
U.S. Government Users Restricted Rights -- Use, duplication or disclosure restricted by GSA ADP Schedule Contract with IBM Corp.


Contents

Tables

Figures

About this guide

  • Who should read this guide
  • Related publications
  • Accessibility
  • Web sites
  • How to send your comments
  • Introduction

  • Introduction to autonomic computing
  • Introduction to the Autonomic Computing Toolkit
  • Introduction to the Problem Determination scenario
  • Assumptions
  • Limitations
  • Scenario design and components

  • Problem Determination scenario design
  • Problem Determination scenario components
  • AME
  • Managed resource components
  • User interface
  • Installation and setup

  • Prerequisites and dependencies
  • Installing the Problem Determination scenario
  • Uninstalling the Problem Determination scenario
  • Running the scenario

  • Activating the scenario
  • Microsoft Windows start and stop server application shortcuts
  • Starting the scenario
  • Inducing and fixing an error condition
  • Stopping the scenario
  • Resetting the scenario
  • Appendix A. Custom code and scripts

    Appendix B. CanonicalSituationMonitor resource model

  • Problem Determination scenario resource model
  • Using an existing resource model
  • Creating a new resource model
  • Generating the resource model file
  • MOF file
  • Explanation of JavaScript in the CanonicalSituationMonitor resource model
  • Appendix C. Getting help, service, and information

    Appendix D. Notices

  • Trademarks
  • Important notes
  • Index


    Tables

    1. Link status based on context

    Figures

    1. Problem Determination scenario design
    2. Problem Determination scenario architecture broken down
    3. Integrated Solutions Console navigation panel
    4. Control page portlet
    5. Start/Stop Scenario interactions
    6. Component interactions
    7. Inducing and correcting an error condition

    About this guide

    This guide provides information on the Problem Determination Log/Trace Scenario for the Autonomic Computing Toolkit.


    Who should read this guide

    This guide is for Autonomic Computing Toolkit users who want to run the Problem Determination scenario and understand how it functions.


    Related publications

    The latest softcopy versions of documentation are available on the autonomic computing developerWorks Web site at:

    www.ibm.com/developerworks/autonomic

    The autonomic computing library consists of the following documents:


    Accessibility

    The HTML version of this guide and other related publications is accessibility-enabled for use with the IBM Home Page Reader.


    Web sites

    For the latest news and tips on general autonomic computing topics, go to the autonomic computing Web site at:

    www.ibm.com/autonomic

    You can also download the Autonomic Computing Toolkit, documentation, and access additional information from the developerWorks Web site at:

    www.ibm.com/developerworks/autonomic/

    Here you will find information about general autonomic computing concepts, an overview of the Autonomic Computing Toolkit, and most importantly, articles, and tutorials that show you how to apply the tools from the Autonomic Computing Toolkit in real-life situations. After you decide which pieces of the Autonomic Computing Toolkit you need, you can easily download the code right from this Web site.


    How to send your comments

    Your feedback is important to help us provide the highest quality information. If you have any comments about this guide, you can submit them on the IBM autonomic computing Web site at:

    www.ibm.com/developerworks/autonomic


    Introduction

    This chapter provides an introduction to autonomic computing and the Autonomic Computing Toolkit.


    Introduction to autonomic computing

    To satisfy the needs of users, computing systems have become more complex and more costly to install and maintain. Today, system administrators must deal with hundreds of subsystems, and with thousands of parameters, just to keep systems running, and administrative costs far outstrip hardware and software acquisition costs.

    Autonomic computing is about satisfying those needs with less human administration than is often required. Autonomic systems adapt to changing conditions and workloads, heal over errors and failures within themselves, and proactively defend against attack. These systems manage themselves and remove complexity from the lives of administrators and users.


    Introduction to the Autonomic Computing Toolkit

    The Autonomic Computing Toolkit is a collection of technologies, tools, scenarios, and documentation that is designed for users wanting to learn, adapt, and develop autonomic behavior in their products and systems. Scenarios are provided to show how the technologies can be used in realistic situations.


    Introduction to the Problem Determination scenario

    The Problem Determination scenario represents a simple self-managing system that uses an intelligent control loop to collect system information, analyze it, plan appropriate responses, and then make necessary adjustments to resolve problems. This scenario exposes the specific technologies that make up a realistic self-healing system, and demonstrates how they work together to achieve an adaptive level (level 4 of the autonomic maturity levels) of self-management. For an explanation of autonomic maturity levels, see the Autonomic Computing Toolkit User's Guide. The scenario serves as an educational vehicle for introducing the self-healing attribute of an autonomic environment, as well as providing a demonstration of the technologies involved that actually make it happen. Examining this scenario will establish the fundamentals to allow you to begin creating your own self-healing systems.

    This scenario demonstrates the interaction between an autonomic manager and a managed resource, as defined by the autonomic computing reference architecture. For more information on the autonomic computing reference architecture, see the Autonomic Computing Toolkit User's Guide. The managed resource produces Common Base Event data from multiple product log files by using the Generic Log Adapter (GLA) included in the Autonomic Computing Toolkit. Common Base Event-formatted data is used to communicate the log file information to the autonomic manager in a consistent and common format. The scenario shows how log file information from the managed resource flows to the autonomic manager through use of a sensor, while the autonomic manager analyzes the data and issues corrective action through use of an effector.

    The autonomic manager implementation used in this scenario is the Autonomic Management Engine (AME). A dedicated AME resource model has been constructed specifically for this scenario. AME, with the use of the resource model, detects an error by analyzing log file data for specific conditions.

    The scenario also contains an administration console, which has been developed using the Integrated Solutions Console included in the Autonomic Computing Toolkit. This serves as an example of how a control console can be built to administer multiple components of an autonomic system.

    This scenario uses standard product log file information to detect, analyze, and correct a common problem involving multiple interacting products. The scenario demonstrates debugging and resolving a situation in a system rather than an individual product, which is more useful in complex heterogeneous IT environments. With that, you can apply those same concepts and techniques to begin creating your own self-healing solution using your own products.


    Assumptions

    The scenario requires a certain level of familiarity with autonomic computing concepts and terminology. An introductory level of understanding of the core elements of the IBM autonomic computing reference architecture, including autonomic managers and managed resources, is necessary to understand the concepts presented in this scenario. See the Autonomic Computing Toolkit User's Guide for the required background information.

    Details about how this sample solution was developed and how to customize its components to build similar solutions can be found in the Autonomic Computing Toolkit Developer's Guide.


    Limitations

    This scenario is restricted to running on a single machine.


    Scenario design and components

    The Problem Determination scenario demonstrates an autonomic control loop operating on two sample IBM products, IBM WebSphere(R) Application Server - Express and Cloudscape(TM) database management, each of which output product information to a log file. The database is populated with sample data. A Web application is included that runs on WebSphere Application Server and is set up to continuously read the sample data from the Cloudscape database. As the Web application queries for database information, the products output event information to their logs. These output log files are the basis for creating the self-healing system in this scenario. When a product such as IBM WebSphere Application Server - Express sends log file information to an autonomic manager for analysis and corrective action, the product becomes a managed resource. Autonomic managers control managed resources.

    In order for the autonomic manager to be able to analyze the log file data, it must be received in the Common Base Event data format. Since the products used in this scenario do not output log file data in that format, the scenario utilizes the Generic Log Adapter (GLA) technology to provide the conversion. GLA was chosen to illustrate how products can participate in self-healing without having to modify the product. The purpose of the GLA in this scenario is to dynamically consume the product log files and convert them in real-time into the Common Base Event data format. Once in the Common Base Event data format, the GLA is configured to pass the information to the autonomic manager for analysis using a standard sensor interaction style (for information on interactions styles, see the Autonomic Computing Toolkit Developer's Guide).

    The autonomic manager consumes the Common Base Event data and analyzes the log data using a dedicated resource model, looking for specific events. After an event has been detected, the autonomic manager issues corrective action on the appropriate managed resource.

    Administration and control of the scenario are handled by a custom plug-in component to the Integrated Solutions Console that was created for the Problem Determination scenario.


    Problem Determination scenario design

    The Problem Determination scenario design can be divided into the following four blocks as shown in Figure 1.

    Figure 1. Problem Determination scenario design

    This is a block graphic of the Problem Determination scenario design

    Autonomic manager
    AME is the implementation used in this scenario to represent an autonomic manager. AME communicates with the managed resource using a set of sensor and effector interfaces.

    Managed resource
    WebSphere Application Server - Express is the managed resource in this scenario. A simple Web application has been created, referred to as the Cloudscape WebApp, and runs in WebSphere Application Server, interacting with the Cloudscape database. The Generic Log Adapter is used to convert WebSphere Application Server log messages to the Common Base Event data format, thus providing a managed resource touchpoint implementation.

    Cloudscape database
    The managed resource interacts with Cloudscape. The Cloudscape database is used to induce a detectable error in this scenario.
    Note:
    Cloudscape is not a managed resource. For the sake of simplicity, Cloudscape is grouped with WebSphere Application Server in the figures.

    User interface
    The user interface for this scenario is the Integrated Solutions Console. A custom plug-in developed specifically for this scenario allows you to view and control the scenario demonstration.

    These components can be further broken down as shown in Figure 2.

    Figure 2. Problem Determination scenario architecture broken down

    This graphic shows the Problem Determination scenario architecture broken down


    Problem Determination scenario components

    This section describes the various components of this scenario.

    AME

    AME uses a resource model to monitor the status of the managed resource by monitoring the Common Base Event message file. The resource model reads the Common Base Event message file and parses the message to detect an error condition.

    A resource model created specifically for the scenario looks for an error condition on the Cloudscape product and determines the appropriate corrective action.

    A Java program starts AME and performs the following actions:

    1. If running the scenario for the first time:
      1. Installs the resource model (named CanonicalSituationMonitor).
      2. Creates an instance of the resource model.
      3. Starts the resource model instance.
    2. If the scenario has been used at least once the Java program starts the resource model instance that is already installed.

    Once the resource model instance is started, the resource model takes care of detecting and fixing a problem. For details on how the CanonicalSituationMonitor resource model functions, refer to Appendix B, CanonicalSituationMonitor resource model. The AME Java program periodically checks for the existence of a file named STOP_AME in the PD Scenario installation directory

    <PDScenario>\bin\rm
    

    ; the program exits if it finds the file.

    Managed resource components

    Cloudscape WebApp is a Web application similar to a traditional application that can be manipulated to demonstrate the closed loop. It can be divided into two parts:

    Cloudscape database

    The Cloudscape database can be started in two modes:

    Embedded mode
    When an application starts up an instance of Cloudscape within its Java Virtual Machine (JVM), the application is said to run in an embedded environment. In this environment, only a single application can access a database at one time and no network access occurs. Loading the embedded driver starts Cloudscape.

    Client-server mode (network mode)
    When multiple applications connect to Cloudscape over the network, they are said to run in a client/server environment. Cloudscape runs embedded in a server framework that allows multiple network connections.

    The Autonomic Computing Toolkit uses Cloudscape in client-server mode. Scenario batch files are used to start and stop the network server. The default port number (1528) is used by the server. However, you may change this port number. A scenario script is used to create and populate the database to be used by the Web application.

    Cloudscape Web application

    The Web application is deployed on WebSphere Application Server - Express.

    The servlet connects to the database in the init() method. A data source is defined in the application server, and this is used by the servlet to connect to the database.

    In the doView()/doPost() method, a large number of SELECT queries are executed on the database.

    All the messages displayed by the servlet will be defined in properties files.

    The following scripts are used to install and administer resources on WebSphere Application Server:

    Parsing log messages with the GLA

    The GLA run time parses WebSphere Application Server log (activity.log) file and converts the log messages into Common Base Event format to be passed to the autonomic manager using a standardized interaction style.

    The GLA run time requires one configuration file, PDScenarioContext.adaptor. This file tells the run time the location of the WebSphere Application Server log file (activity.log) and where to output the Common Base Event messages and defines the rules to convert log messages to Common Base Event format.

    The important sections of the adaptor from the Problem Determination scenario perspective are described below.

    The executableClass (com.ibm.etools.logging.adaptor.outputters.ReceiveNotificationOutputter) uses the management API. This class sends the Common Base Event objects to a manager (com.ibm.autonomic.scenario.pd.manager.PDAutonomicManagerTouchpointSupport) using RMI instead of writing the Common Base Event directly to an output file. In the scenario, the manager writes the Common Base Event object to the output file.

    <hga:Component description="CBE File Outputter" 
    executableClass="com.ibm.etools.logging.adaptor.outputters.ReceiveNotificationOutputter" 
    implementationCreationDate="2003-10-07T12:00:00" implementationVersion="1.0.0" 
    implementationVersionDescription="A simple CBE outputter that takes a CBE and writes it to a file" 
    loggingLevel="60" name="CBEFileOutputter" role="outputter" roleCreationDate="2003-10-07T12:00:00" 
    roleVersion="1.0.0" roleVersionDescription="initial release" uniqueID="AdaptorCBEOutputterID3"/>
    

    The following XML statements tell the GLA run time the location of the WebSphere Application Server log file. (The converter tells GLA how to convert the activity log file from binary to text format.)

    <cc:Sensor description="" maximumBlocking="5" type="SingleFileSensor" uniqueID="sensorID3">
         <sensor:SingleFileSensor
    			converter="&quot;C:\ACComponents\ISC\Runtime\AppServer\bin\showlog.bat&quot; 
    			&quot;C:\ACComponents\ISC\Runtime\AppServer\logs\activity.log&quot; 
    			&quot;C:\Program Files\PdScenario_Home\activity.txt&quot;"
    				directory="C:\Program Files\PdScenario_Home\"
    				fileName="activity.txt" />
         </cc:Sensor>
    

    The purpose of the following parsing rule is to get the name of the Web application from the log message.

    <parser:RuleAttribute name="application" index="0">
                <SubstitutionRule match=".*\s+[aA]pplication\s*.*:\s*(.*)" 
    				positions="$h('PrimaryMessage')" substitute="$1" />
                <SubstitutionRule substitute="UNKNOWN"/>
              </parser:RuleAttribute>
    

    The following parsing rule is used to detect if the scenario Web application has started.

    <parser:RuleAttribute index="1070522678551" name="situationQualifier">
                      <SubstitutionRule match=".*WSVR0200I:.*" positions="$h('PrimaryMessage')" 
    							substitute="START INITIATED"/>
                      <SubstitutionRule match=".*WSVR0221I:.*" positions="$h('PrimaryMessage')" 
    							substitute="START COMPLETED"/>
                      <SubstitutionRule match=".*WSVR0001I:.*" positions="$h('PrimaryMessage')" 
    							substitute="START COMPLETED"/>
                    </parser:RuleAttribute>
    

    The following code is used to detect the error condition.

    parser:RuleElement index="0" name="ConnectSituation">
    - <parser:RuleAttribute index="0" name="successDisposition">
      <SubstitutionRule match=".*DSRA0080E:.*" positions="$h('PrimaryMessage')" substitute="UNSUCESSFUL" /> 
      <SubstitutionRule match=".*CONM7007I:.*" positions="$h('PrimaryMessage')" substitute="UNSUCESSFUL" /> 
      <SubstitutionRule match=".*J2CA0056I:.*" positions="$h('PrimaryMessage')" substitute="UNSUCESSFUL" /> 
      <SubstitutionRule match=".*SRVE0026E:.*" positions="$h('PrimaryMessage')" substitute="UNSUCESSFUL" /> 
      </parser:RuleAttribute>
    - <parser:RuleAttribute index="0" name="reasoningScope">
      <SubstitutionRule match=".*DSRA0080E:.*" positions="$h('PrimaryMessage')" substitute="INTERNAL" /> 
      <SubstitutionRule match=".*CONM7007I:.*" positions="$h('PrimaryMessage')" substitute="INTERNAL" /> 
      <SubstitutionRule match=".*J2CA0056I:.*" positions="$h('PrimaryMessage')" substitute="INTERNAL" /> 
      <SubstitutionRule match=".*SRVE0026E:.*" positions="$h('PrimaryMessage')" substitute="INTERNAL" /> 
      </parser:RuleAttribute>
    - <parser:RuleAttribute index="0" name="situationDisposition">
      <SubstitutionRule match=".*DSRA0080E:.*" positions="$h('PrimaryMessage')" substitute="CLOSED" /> 
      <SubstitutionRule match=".*CONM7007I:.*" positions="$h('PrimaryMessage')" substitute="CLOSED" /> 
      <SubstitutionRule match=".*J2CA0056I:.*" positions="$h('PrimaryMessage')" substitute="CLOSED" /> 
      <SubstitutionRule match=".*SRVE0026E:.*" positions="$h('PrimaryMessage')" substitute="CLOSED" /> 
      </parser:RuleAttribute>
      </parser:RuleElement>
     
    

    User interface

    The control of the Problem Determination scenario is provided by Integrated Solutions Console. A custom-developed plug-in provides a user interface to execute and view the Problem Determination scenario.

    Integrated Solutions Console page layout

    Figure 3 illustrates the navigation pane of the autonomic computing scenario component plug-in.

    Figure 3. Integrated Solutions Console navigation panel

    This graphic shows a screen shot of the Integrated Solutions Console navigation panel.

    A block diagram and two pages are provided:

    Control page

    There are three portlets on the control page to control the scenario and display the status.

    Control Scenario portlet

    This portlet allows you to control the scenario by performing three types of operations:

    The Control portlet logs the progress of each operation in a file in the appropriate location. Each operation writes to the same file in overwrite mode. This means that at any point, only the messages from the last operation are available.

    Each of the allowed operations are a clickable link that will be enabled or disabled based on the context. Table 1 shows the status of the links depending on the context.

    Table 1. Link status based on context

    Context Setup scenario View WebApp Induce condition Close scenario
    Initial Enabled Disabled Disabled Disabled
    Started Disabled Enabled Enabled Enabled
    Stopped Enabled Disabled Disabled Disabled

    Log view portlet

    Use these two links to view log data as it displays during the course of the scenario.

    Display Status portlet

    This portlet displays the status of the operations executed from the Control portlet by displaying the content of the output file used by the Control portlet.

    The panel has two sections:

    1. Operation activity -- This explains the steps which must be executed to complete the submitted command.
    2. Operation status -- This displays the list of the activities. The activity most recently executed is highlighted.

    Installation and setup

    This chapter contains information on installing the Problem Determination scenario.


    Prerequisites and dependencies

    The Problem Determination scenario is included in the Problem Determination Scenario bundle; it provides the customized components to establish the specific scenario as well as demo support code to actually control the demonstration. The following components are required prior to installing the PD bundle scenario and are available in the Autonomic Computing Toolkit:

    See the Autonomic Computing Toolkit User's Guide for information on components included in the bundles and for the prerequisites to run the Problem Determination scenario and its required bundles.

    The Problem Determination scenario is supported on the following platforms:

    Note:
    If an earlier version of this scenario is already installed on your system, you must uninstall it before installing this version of the scenario.

    Installing the Problem Determination scenario

    To install the Problem Determination scenario, perform the following steps:

    1. Verify the system requirements needed to run the scenario (see Prerequisites and dependencies).
    2. Double-click the executable file (PDScenario_v2-0-0_win32.exe, PDScenario_v2-0-0_linux.bin, PDScenario_v2-0-0_aix.bin, PDScenario_v2-0-0_solaris.bin,) to begin the installation. The Welcome window appears.
      Note:
      The OS/400 installer supports installation in command line mode only. Invoke the installer using the following command and follow instructions to install the package.

      java -jar PDScenario_v2-0-0_os400.jar -console

    3. Click Next.

      If the scenario is already installed on the system, an error message will appear. To correct this problem, perform the following steps:

      1. Click Cancel on the error message window.
      2. Click Yes to exit.
      3. Go to Uninstalling the Problem Determination scenario and perform the appropriate procedure to uninstall the scenario.
      4. Go to step 1 to begin the installation process again.
    4. Accept the terms on the License panel and click Next. The list of prerequisites for the scenario is displayed as shown below. You can override the verification for these prerequisites by using the No option on the next panel.
    5. Click Next, the Install Type panel displays. Select one of the following options:
    6. Click Next. The Install Location panel displays the default installation location. Change the location if the default is not acceptable, and then click Next on the installation location panel.
    7. Click Next. A summary panel is displayed.
    8. Click Next on the summary panel. The following actions occur automatically. Output can be viewed in the logs folder.
    9. Type a valid Integrated Solutions Console Adminstrator login ID and password.
    10. Click Next.
    11. If the Integrated Solutions Console Administrator login values are incorrect, a message that says Invalid login values. Please click Next and re-enter the values. displays. Click Next re-enter the values.
    12. The Problem Determination scenario component is automatically deployed. Output of this can be viewed in the logs folder. If the installation was not successful, the log file will contain an error message.
    13. Click Finish when the installation process has completed.

    Uninstalling the Problem Determination scenario

    To uninstall the Problem Determination scenario, perform the following steps:

    1. For Windows:
      1. Go to Start -> Settings -> Control Panel.
      2. Double-click Add/Remove Programs. A list of applications installed on your system displays.
      3. Select the Problem Determination scenario from the list and click Remove.
      4. A panel prompting for the Integrated Solutions Console Administrator's userid and password displays. This is used for removing the Problem Determination component from the Integrated Solutions Console.
      5. Complete the uninstallation wizard that launches.
    2. For Linux and AIX and Solaris:
      1. Double-click the Uninstaller.bin executable file located in the directory where the Problem Determination scenario is installed. The Welcome panel appears.
      2. Click Next. The uninstallation details display.
      3. Click Next.
      4. Click Finish when the uninstallation process has completed.
    3. For OS/400, enter the following command:

      java -jar <PD_HOME>/_uninst/uninstall.jar -console


    Running the scenario

    This chapter describes how to run the Problem Determination scenario.

    The Integration Solutions Console must be activated before control of the scenario can begin. Navigate to the Problem Determination Scenario Control panel to begin running the scenario.


    Activating the scenario

    In order for the scenario to function, the following two servers must be started:

    Note:
    These servers are started when Integrated Solutions Console is installed. If the machine is rebooted, these servers will be in stop state. The Integrated Solutions Console does not function if ISC_Portal server is not running. The Problem Determination Scenario does not function if server1 is not running.

    To start these servers using the embedded WAS, execute the following commands from the <ISC_INSTALL_LOCATION>/Runtime/AppServer/bin directory on a command line:

    If Integrated Solutions Console is installed on an external WAS, this command has to be executed from the bin directory of the installed WAS.

    In either case, you can use the provided shortcuts described in Microsoft Windows start and stop server application shortcuts to both start and stop the required servers:

    Microsoft Windows start and stop server application shortcuts


    Graphic of shortcuts to Start and Stop the ISC servers for Windows Operating System.
    For Microsoft Windows:

    On AIX products are installed by the root user. As a result the shortcuts are only accessible to the root user. If other users need access to the product and the shortcuts, the permissions need to be changed manually. After the shortcuts are added each user has to refresh their Desktop Application to view the shortcuts in the application manger by performing the following steps:

    1. Open Application Manager>Desktop Tools
    2. Double click Reload Desktop Applications
    3. Return to the Application Manger. The IBM Autonomic Computing Toolkit shortcut is now available.

    To activate the scenario, perform the following steps:

    1. Open the following URL in a Web browser to log onto Integrated Solutions Console:

      http://<HOST_NAME>:<PortalServer.Port>/ibm/console/

      The PortalServer.Port value can be found in <ISC_INSTALL_LOCATION>/Runtime/isc.properties file, or use the provided shortcut. See Microsoft Windows start and stop server application shortcuts.

    2. Log in.
    3. Expand Autonomic Computing Scenarios.
    4. Expand Problem Determination Scenario.
    5. Click Scenario Control. The control panel displays.
      Note:
      For OS/400 System only. If Integrated Solutions Console is installed on Websphere Application Server Base Edition, please do the following before activating the Problem Determination Scenario: Grant RWX authority to QEJBSVR user on showlog and wsadmin files. This can be done by executing the following command :
      1. CHGAUT OBJ('/QIBM/ProdData/WebAS5/Base/bin/wsadmin') USER(QEJBSVR) DTAAUT(*RWX)
      2. CHGAUT OBJ('/QIBM/ProdData/WebAS5/Base/bin/showlog') USER(QEJBSVR) DTAAUT(*RWX)
    6. Click Begin Scenario to activate the scenario.

      For Microsoft Windows Systems only, a command window opens when the Setup Scenario link is clicked. The command window is automatically closed when the scenario is closed by using the Close Scenario link


    Starting the scenario

    Figure 5 shows the interaction between the user interface and the Scenario Components for the Start Scenario operation.

    Figure 5. Start/Stop Scenario interactions

    Start/Stop Scenario interactions

    Click the Begin Scenario link to begin the Start Scenario operation (see Figure 4). The following actions occur automatically:

    1. Start Cloudscape Database (path 1)
    2. Start Cloudscape WebApp (path 2)
    3. Start GLA (path 3)
    4. Start AME and RM (path 4)

    After the scenario has started, the various components will be interacting as shown in Figure 6.

    Figure 6. Component interactions

    Component interactions

    The following actions are shown occuring in Figure 6:

    1. Path (1): WebSphere Application Server writes messages in its log file.
    2. Path (2): GLA reads the messages from the WebSphere Application Server log file.
    3. Path (3): GLA converts the log messages to Common Base Event format and writes it to Common Base Event output file.
    4. Path (4): AME reads Common Base Event messages and analyzes them.

    The Start and Stop operation of each component are logged in the following files:

    1. Cloudscape network server: cds_status_file.txt
    2. GLA: gla_status_file.txt
    3. AME: ame_status_file.txt
    4. Start PDWebApp: startapp_status_file.txt
    5. Stop PDWebApp: stoptapp_status_file.txt

    Inducing and fixing an error condition

    An error situation must be explicitly induced to demonstrate the self-healing capability.

    The Cloudscape network server is shut down to break the communication link between the Web application and the database by invoking a scenario script to shut down the database.

    Note:
    The condition is actually induced only after the Web application is invoked at least once. To invoke the Web application, click the View Web Application link in the control panel.. Although the communication link is broken when you shut down the Cloudscape server, the broken link is not detected until something tries to access it. When the Web application is invoked, it will try to connect to the database and the broken link will be detected. In this case, WebSphere Application Server will log the error message in activity.log which will be picked up by AME.

    Figure 7. Inducing and correcting an error condition

    Inducing and correcting an error condition

    The following actions are shown occurring in Figure 7:

    1. Path (0): Cloudscape database server is shut down. Communication fails between the WebApp and the database.
    2. Path (1): WebSphere Application Server logs error.
    3. Path (2): GLA picks up error message.
    4. Path (3): Error message is converted to Common Base Event format by GLA.
    5. Path (4): AME picks up Common Base Event messages and detects the condition.
    6. Path (5): AME restarts Cloudscape database to fix the error condition.

    The communication link should be reestablished for the Web application to function properly. AME invokes a scenario script to perform the following actions to fix the error condition:

    1. Stops the Web application.
    2. Starts the Cloudscape Network Server.
    3. Restarts the Web application

    Stopping the scenario

    Click the Close link to begin the Stop Scenario operation, The following actions occur automatically (Figure 6):

    1. Stop AME and RM (path 4)
    2. Stop GLA (path 3)
    3. Stop Cloudscape WebApp (path 2)
    4. Stop Cloudscape Database (path 1)

    The Setup and Close operations of each component are logged in the following files:

    1. Cloudscape network server: cds_status_file.txt
    2. GLA: gla_status_file.txt
    3. AME: ame_status_file.txt
    4. Start PDWebApp: startapp_status_file.txt
    5. Stop PDWebApp: stoptapp_status_file.txt

    Resetting the scenario

    Use reset command if the scenario is not functioning correctly. This command must be run from the directory where the problem determination scenario is installed. To run the command:

    1. Change to the directory that contains the scenario.
    2. Type reset.bat/sh and press Enter.

      The command file performs the following operations to restore the problem determination scenario to its initial state:

      1. Performs a Stop Scenario operation.
      2. Deletes the pdscenario.state file.
      3. Deletes the work directory of AME.
      4. Deletes the logs folder.
      5. Deletes all .txt and .log files.

      After these operation complete, restart the scenario by clicking the Setup Scenario link.


    Appendix A. Custom code and scripts

    This scenario includes various pieces of custom code that are needed to provide the actual demonstration of the scenario. Inducing the error condition, for example, is not something you would include in an actual self-healing solution.

    Code to perform the following tasks is required for the scenario demonstration:


    Appendix B. CanonicalSituationMonitor resource model

    This section describes the resource model used in the Problem Determination scenario, referred to herein as the CanonicalSituationMonitor resource model. Before addressing the details of the resource model, it is recommended that you be familiar with following outlined concepts and recommended readings.


    Problem Determination scenario resource model

    This resource model is built using the Resource Model Builder (RMB), and the same resource model can be used for Microsoft Windows, Linux, and AIX platforms. The source code for the RMB project for this resource model is available in the Canonical folder inside the CanonicalSituationMonitor.zip file available in the directory where the Problem Determination scenario is installed. This resource model uses some preexisting Java libraries that have the functionality of being LogFileAdaptor ILT. These libraries were developed by Tivoli Component Services group and the Autonomic Computing Toolkit CanonicalSituationMonitor.zip resource model uses them. These files have to be added as dependencies to the CanonicalSituationMonitor resource model. The files are:

    All the above files are available inside the __OS Name__ CanonicalSituationMonitor.zip where OSName is the name of the operating system (win32-ix86, linux-ix86, or aix4-r1) for which you are interested.

    Note:
    You can also write your own ILT for the resource models by referring to the IBM Tivoli Monitoring Version 5.1.1 Creating Resource Models and Providers Redbook.

    Using an existing resource model

    In order to work with RMB, you need an RMB project file. For the Problem Determination resource model, the project file is Canonical.jrm. You can get the project file by extracting the Canonical.jrm and .project files from the CanonicalSituationMonitor.zip in the scenario bundle. These files will extract into the folder Canonical Folder.

    To use this, open RMB, choose File-> Import -> Existing Project into work Space and choose the Canonical Folder. You now have a CanonicalSituationMonitor resource model in your RMB.


    Creating a new resource model

    To create your own resource model, perform the following steps:

    1. Create a project in RMB and then use the Basic Resource Model wizard to generate a JavaScript-based resource model for the appropriate platform.
    2. In the Basic Resource Model Wizard Datasource Selection Page panel of the wizard, select CIM/WMI Datasource.
    3. In the CIM Datasource Panel, select MOF Compiler. The WMI MOF Compiler Wizard displays.
    4. In the WMI MOF Compiler Wizard, select Compile a .MOF file and click GO!.
    5. Browse for the CanonicalSituationMonitor.mof file extracted earlier from the zip file.
    6. Make sure the Namespace is root/default and click Finish. The following message should be displayed in the compiler output panel of the WMI MOF Compiler Wizard:
      Microsoft (R) 32-bit MOF Compiler Version 1.50.1085.0007
      Copyright (c) Microsoft Corp. 1997-1999. All rights reserved.
       
      Parsing MOF file: E:\w32\CanonicalSituationMonitor.mof
      MOF file has been successfully parsed
      Storing data in the repository...
      Done!
       
      
      If you do not get the above message, there is an error in your MOF file. You will be returned to the CIM Datasource panel.
    7. Change the Namespace to root/default and click the Connect button.
    8. Select CommonSituationsEvent from the list of classes shown and click Next. The CIM Datasource Wizard Panel for Select properties displays.
    9. Select Situation in the list of selected properties and click Next.
    10. Click Finish in the CIM Datasource Wizard Panel for Select properties to log window. The Basic Resource Model Wizard Datasource page displays.
    11. Click Finish. The Finalizing Creation panel displays.
    12. Accept the default values and keep clicking Next on all subsequent panels until the Finish button is enabled.
    13. Click Finish. The resource model code is generated for you.

    Now you need to put some logic into the newly generated resource model.

    Add the following files as dependencies:


    Generating the resource model file

    After the resource model code has been generated, you can use the ITM ->Generate Package -> AME(zip) option in the menu bar to generate the resource model.


    MOF file

    You can either use the existing MOF file or write another one from scratch. The MOF file contains the property which the M12-based Java provider can fetch:

    [M12_Instrumentation("Java.com.tivoli.wmftools.ilt.logfileadapter.LogfileAdapter | %EVAL 
    		'FileName=\"'@USERDATA_VALUE(parmLogName)'\";
    				' \"gnuRegExp='(<CommonBaseEvent> +.*?</CommonBaseEvent>)';
    							\" 'cacheCount = 100;
    									' | ENUM"), 
    Provider("com.tivoli.dmunix.ep.touchpoint.cimom.ifc.M12JavaProvider")]
    		class CommonSituationsEvent
    							{
    		[M12_Instrumentation("Java.com.tivoli.wmftools.ilt.logfileadapter.LogfileAdapter | 
    					gnuRegExp='(<CommonBaseEvent +.*?</CommonBaseEvent>)'; 
    							SubExp=1; 
    									| GET"), 
    					Provider("com.tivoli.dmunix.ep.touchpoint.cimom.ifc.M12JavaProvider")]
        						string situation;
            							[key]
        										string FileName;
           											[key]
        														uint32 Offset;
    											};
     
    

    This MOF files uses the ILT wmftools.jar. The class used is Java.com.tivoli.wmftools.ilt.logfileadapter.LogfileAdapter. The parameters passed to this ILT class are the log filename and regular expression that you want to search for in the log file. In this case the following is used:

    gnuRegExp='(<CommonBaseEvent> +.*?</CommonBaseEvent>)';\" '
    

    This means every call to ILT to fetch data will give you all the data in between <CommonBaseEvent> </CommonBaseEvent> tags in log file. The ILT stores the location of last read offset of the log file in a file called logfilename.properties, which is stored in the directory pointed by TMP variable (logfilename represents the name of the log file that you are monitoring).

    Any time you delete the log file, you have to delete logfilename.properties file; otherwise, the ILT will not pick up any instances.

    The variable situation contains the data read from the log file.


    Explanation of JavaScript in the CanonicalSituationMonitor resource model

    You update the Init() and VisitTree functions of the resource model generated through the resource model wizard. If you want to add more checks to the Common Base Event data and then do some actions based on the data, you should update the VisitTree section of the resource model using the RMB tools.

    In the Init() function of the resource model you need do some initializations based on the operating system on which the resource model will run.

    interpType = Svc.GetInterp();//Use the interp to set the which file would be executed 
    if (interpType == "w32-ix86") 
    		{ 
    		startScript = "..\\WebApp\\startpdapp.bat”;
    		stopScript = "..\\WebApp\\stoppdapp.bat"; 
    		StartNetworkServer = "..\\WebAppDB\\startnetworkserver.bat"; 
    						}
    							else 
    									{ 
    		startScript = "../WebApp/startpdapp.sh";
    		stopScript = "../WebApp/stoppdapp.sh"; 
    		StartNetworkServer = "../WebAppDB/startNetworkServer.sh"; 
    												} 
     
    

    Since the Autonomic Computing Toolkit uses a Mappingstring substitution on the MOF file, the parameter used for the log file name has to be associated to the resource model class. In this case, the resource model class is called Common Situation Format (CSF).

    Svc.AssociateParameterToClass("parmLogName", "CSF");
    

    You can monitor any situation. If you want the resource model to do Problem Determination and self-healing, then you have to enable both situationCategory and connectCategory. The code below shows how this is done. You get the number of parameters and set two new global variables based on the situations you are interested in monitoring. These variables are then used in the VisitTree function.

    for (idx = 0; idx < numberOfParameters; idx++)
    		{
    				str_parameters = Svc.GetStrParameter("situationCategory", idx);
    							if (str_parameters == str_START)
    					{
    									bool_start = 1;
    							}
    				else
    									{
      												if (str_parameters == str_CONNECT)
    										{
    												bool_connect = 1;
    								}
    					}
    		}
     
    

    In order to make sure that you have a fresh start every time you start the resource model, any old flags indicating recovery is deleted.

    if (Svc.ExistsMapElement(tmpMap, str_conn_errorRecovery))
    				{	
    						Svc.RemoveMapElement(tmpMap, str_conn_errorRecovery);
    								}
     
    

    The VisitTree function in the resource model contains the monitoring algorithm that is called cyclically after a cycle time has elapsed. Implement the monitoring code here. The algorithm processes each new situation written to the error log during this cycle.

    // return the number of instances that has been read from the log file
    					numberOfInstances = Svc.GetNumOfInst("CSF"); 
     
    

    The next task is to get the situation (CSF) properties that are required for the analysis. This is done for each new situation using the API GetStrProperty() as show below.

    for (idx = 0; idx < numberOfInstances; idx++)
    			{
    					var str_situation = Svc.GetStrProperty("CSF", idx, "situation");
    							}
     
    

    The entire Common Base Event is bundled up into one property in CIM class (for example, situation) due to the nature of the Common Base Event and to prevent problems that might occur using Regular Expression required for the ILT interface and the // M12_Instrumentation qualifier. After the complete Common Base Event message is available in the str_situation variable, use some simple Java parsing techniques to extract the require information needed for analysis.

    Once you have the above details, the following algorithm is applied:

    // Check for ConnectSituation in situation category name 
    // and CLOSED in situationDisposition
    	if (( situationcategoryName == ConnectSituation) && 
    	 		( situationDisposition  == CLOSED))
    		{
    // Now Check for CloudScape Error in the message id got from CBE. 
    // The messges we are interested are the following
    			str_CONN_ERR_1 = "CONM7007I";
    			str_CONN_ERR_2 = "J2CA0056I";
    			str_CONN_ERR_3 = "SRVE0026E";
    			str_CONN_ERR_4 = "DSRA0080E";
    					
    	if (bool_connect
    			&& (str_situationcategoryName == str_CONNECT)
    			&& ((str_msgId == str_CONN_ERR_1) 
    			||  (str_msgId == str_CONN_ERR_2) 
    			||  (str_msgId == str_CONN_ERR_3) 
    			||  (str_msgId == str_CONN_ERR_4)))
    					{
    // If deadlock recovery flag is set do not do any 
    // processing since some error recovery is under processing.
    if (Svc.ExistsMapElement(tmpMap, str_conn_errorRecovery))
    							{
    								 continue;
    										}
     
    // If no recovery is under progress, then we will set 
    // a Set error recovery flag
    					Svc.SetMapStrElement(tmpMap, 
    					str_conn_errorRecovery, str_dlr_ON);
     
    //Run the batch files in the following order to do the recovery
     
    // Stop the WebSphere Application
    					shellRc = Svc.ShellCmd(stopScript);
     
    // Restart CloudScape
    					Svc.DetachedShellCmd(StartNetworkServer);
     
    // Start the WebSphere Application
    					shellRc = Svc.ShellCmd(startScript);
     
    // Now for status reporting and removing of the recovery flag 
    // we also need to monitor  for  StartSituation 
    // situationcategoryName and COMPLETED situationQualifier.
    if ((str_situationcategoryName == str_START) && 
         	(str_situationQualifier.indexOf(str_COMPLETED) > 0))
    				{
     			//Check if it is the monitored application
    							if (str_sourceComponentId_application == str_MonitoredApp)
    					{
    							if (Svc.ExistsMapElement(tmpMap, 
    									str_conn_errorRecovery))
    					{
    		   //Just go a head and remove the recovery flag 
    						Svc.RemoveMapElement(tmpMap, 
    						str_conn_errorRecovery);
    													}
    										}
    							}
     
    

    Appendix C. Getting help, service, and information

    If you need help, service, technical assistance, or just want more information about IBM products, you will find a wide variety of sources available from IBM to assist you.

    IBM maintains pages on the World Wide Web where you can get information about IBM products and services and find the latest technical information.

    If you need help, use the support forum on the developerWorks Web site at:

    www.ibm.com/developerworks/autonomic/


    Appendix D. Notices

    This publication was developed for products and services offered in the U.S.A.

    IBM(R) may not offer the products, services, or features discussed in this document in other countries. Consult your local IBM representative for information on the products and services currently available in your area. Any reference to an IBM product, program, or service is not intended to state or imply that only that IBM product, program, or service may be used. Any functionally equivalent product, program, or service that does not infringe any IBM intellectual property right may be used instead. However, it is the user's responsibility to evaluate and verify the operation of any non-IBM product, program, or service.

    IBM may have patents or pending patent applications covering subject matter described in this document. The furnishing of this document does not give you any license to these patents. You can send license inquiries, in writing, to:

    IBM Director of Licensing
    IBM Corporation
    North Castle Drive
    Armonk, NY 10504-1785
    U.S.A.

    INTERNATIONAL BUSINESS MACHINES CORPORATION PROVIDES THIS PUBLICATION "AS IS" WITHOUT WARRANTY OF ANY KIND, EITHER EXPRESS OR IMPLIED, INCLUDING, BUT NOT LIMITED TO, THE IMPLIED WARRANTIES OF NON-INFRINGEMENT, MERCHANTABILITY OR FITNESS FOR A PARTICULAR PURPOSE. Some jurisdictions do not allow disclaimer of express or implied warranties in certain transactions, therefore, this statement may not apply to you.

    This information could include technical inaccuracies or typographical errors. Changes are periodically made to the information herein; these changes will be incorporated in new editions of the publication. IBM may make improvements and/or changes in the product(s) and/or the program(s) described in this publication at any time without notice.

    Any references in this publication to non-IBM Web sites are provided for convenience only and do not in any manner serve as an endorsement of those Web sites. The materials at those Web sites are not part of the materials for this IBM product, and use of those Web sites is at your own risk.

    IBM may use or distribute any of the information you supply in any way it believes appropriate without incurring any obligation to you.


    Trademarks

    The following are trademarks of International Business Machines Corporation in the United States, other countries, or both:


    AIX PS/2

    BladeCenter Redbooks

    Chipkill ServerProven

    the IBM logo TechConnect

    Lotus X-Architecture

    Predictive Failure Analysis xSeries

    Intel and Xeon are trademarks of Intel Corporation in the United States, other countries, or both.

    Microsoft, Windows, and Windows NT are trademarks of Microsoft Corporation in the United States, other countries, or both.

    UNIX is a registered trademark of The Open Group in the United States and other countries.

    Java and all Java-based trademarks and logos are trademarks or registered trademarks of Sun Microsystems, Inc. in the United States, other countries, or both.



    Other company, product, or service names may be trademarks or service marks of others.


    Important notes

    Processor speeds indicate the internal clock speed of the microprocessor; other factors also affect application performance.

    CD-ROM drive speeds list the variable read rate. Actual speeds vary and are often less than the maximum possible.

    When referring to processor storage, real and virtual storage, or channel volume, KB stands for approximately 1000 bytes, MB stands for approximately 1 000 000 bytes, and GB stands for approximately 1 000 000 000 bytes.

    When referring to hard-disk-drive capacity or communications volume, MB stands for 1 000 000 bytes, and GB stands for 1 000 000 000 bytes. Total user-accessible capacity may vary depending on operating environments.

    Maximum internal hard-disk-drive capacities assume the replacement of any standard hard disk drives and population of all hard-disk-drive bays with the largest currently supported drives available from IBM.

    Maximum memory may require replacement of the standard memory with an optional memory module.

    IBM makes no representation or warranties regarding non-IBM products and services that are in the ServerProven(R) program, including but not limited to the implied warranties of merchantability and fitness for a particular purpose. These products are offered and warranted solely by third parties.

    IBM makes no representations or warranties with respect to non-IBM products. Support (if any) for the non-IBM products is provided by the third party, not IBM.

    Some software may differ from its retail version (if available), and may not include user manuals or all program functionality.


    Index

    A C D E F G H I J K L M N O P R S T U V W
    A C D E F G H I J K L M N O P R S T U V W