Automate cloud event management

Use the automation manager function in Flex System Manager to help maintain system integrity

This article shows IBM® PureFlex™ System administrators how to use the automation manager function in Flex System Manager to automate responses to common cloud system events. Learn what the automation manager's capabilities are, how to create event automation plans with a GUI wizard, and how to reuse existing automation event plans and modify them programatically.

Share:

Ansuman Padhy (anspadhy@in.ibm.com), Senior Staff Software Engineer, IBM

Ansuman Padhy photoAnsuman Padhy has nine years of experience in software engineering. His primary interest is in developing applications and turning ideas into proofs of concept by using technologies such as Java, Eclipse RCP, Android, and various other open source platforms. He is currently working with Python and the OpenStack cloud platform.



Ashish Billore (Ashish.Billore1@in.ibm.com), Cloud Systems Software Developer, IBM

Author photoAshish Billore is an IBM Cloud OS Development Manager at IBM India Labs. He manages IBM Systems Director VMControl development team and is responsible for developing reusable components for building cloud infrastructures. He designs and develops applications using technologies such as Java, Eclipse, OSGi, and web services. He has contributed several patches to the Eclipse open source platform and presented articles at IBM technology and QSE conferences. He holds a Bachelors Degree in Electronic and Communication Engineering and Post Graduation in Information Technology.



Piyush Jain (piyushjain@in.ibm.com), Staff Software Engineer, IBM  

Author photoPiyush Jain is a staff software engineer at IBM. He is currently working on VMControl under IBM Systems Director. He has six years of experience overall and holds a bachelor's degree in information technology engineering from the Institute of Engineering and Technology (IET) Alwar, Rajasthan, India.


developerWorks Contributing author
        level

02 October 2013

Also available in Chinese Japanese

IBM PureFlex System

IBM PureFlex System combines compute, storage, networking, and visualization capabilities under a unified management console that is expert at anticipating resource needs. IBM Flex System goes beyond blades and provides high-performance, easily integrated components.

In a system-management environment, an event is an occurrence of significance to a task or system. An event can be further classified as either an alert (the occurrence of a problem) or a resolution (the occurrence of a solution to a problem). Depending on the problem's relevance and significance, action might be required to maintain optimal use of data center resources. Either an administrator must be notified to take the necessary steps, or the administrator can choose to predefine the action(s) to perform.

When action is needed in an IBM PureFlex System environment, the automation manager infrastructure in IBM Flex System Manager facilitates resolution by notifying the administrator or by initiating a predefined task. IBM Flex System Manager is the management solution for the preconfigured, pre-integrated IBM PureFlex System infrastructure. By automating repetitive tasks, Flex System Manager reduces the number of manual steps needed for typical management tasks. This article introduces the automation manager function and explains how to use it to automate responses to common data center events. With the help of two example scenarios, learn how to create an event automation plan that uses the automation manager to perform predefined tasks in response to specific alerts.

Event automation plans and how they work

The administrator of an IBM PureFlex System environment can create an event automation plan and apply it to a specific system or a group of systems. The event automation plan can automate tasks and common cloud operations in response to situations that occur in the managed environment.

A Flex System Manager event automation plan has three basic components:

  1. Target list: Specifies one or more systems or groups of systems in the managed environment.
  2. Filter: Specifies the types of events to process.
  3. Action: Specifies the response to an event. You can specify one or more actions to start when the event occurs.

Figure 1 shows an event automation plan that an administrator is in the process of creating with the Event Automation Plan wizard that's available from the Flex System Manager console:

Figure 1. Flex System Manager's default support for event filters and event actions
Screen capture from the Event Automation Plan wizard of and event automation plan in progress.

By default, the administrator can choose from all available events, as shown in the wizard's Event Filters screen (background of Figure 1). The Create Action screen (foreground of Figure 1) shows the list of common actions that are available to select for an event. After an event automation plan is in place, the action(s) that are specified for that event execute whenever the event occurs.

Figure 2 shows an overview of event automation plan setup and execution:

Figure 2. Overview of event automation plan setup and execution
Illustration of event automation plan setup and execution

In detail, the steps for the sequence that's shown in Figure 2 are:

  1. The administrator creates an event automation plan and applies it to a specific system or group of systems (such as a server farm).
  2. In the event automation plan, the administrator uses a filter to specify which types of events to process and specifies an event action (such as sending an email notification) for each of those events.
  3. An event occurs in the system that's associated with the event automation plan.
  4. If the event automation plan includes a filter that specifies the type of event that just occurred, the plan sees if any action is specified for that filter.
  5. If an action (such as sending an email notification) is specified, the action is performed.

Now that you have some background about the automation manager and the event automation plan functionality in Flex System Manager, let's see how event automation plans can help you to automate actions in two common data center scenarios.


Scenario 1: Configuring a resilience policy for a VMware cluster

IBM PureFlex System has the intelligence to raise predictive failure alerts (PFAs). A hardware PFA is an event that occurs when a service processor or management module detects that something hardware-related is going to fail; it is a type of prefailure alert. For example, when a dual in-line memory module (DIMM) in a system is about to fail, a hardware PFA occurs.

Figure 3 shows common hardware PFA events that Flex System Manager can respond to:

Figure 3. Common hardware PFA event types
Screen capture from a Flex System Manager event automation plan showing the filter for hardware predictive failure alert events

Enabling a high-availability server farm

In a system-management environment, resilience is the capability of a resource (such as a server, network, or storage device) to recover quickly and continue operating after a failure or disruption occurs. (Another term for this capability is high availability.) If you enable resilience policies in Flex System Manager for hardware PFAs that originate from VMware server clusters (also called VMControl farms), you don't need to know which specific events to monitor. Instead, you create and execute an event automation plan to trigger the VMware maintenance mode functionality. Then, when an alert occurs, the plan automatically relocates all virtual servers on the source host to suitable target hosts in the same cluster. Because hardware issues such as fan or disk failures can be identified early, virtual-server downtime can be avoided.

In PureFlex System, a platform manager manages one or more host systems and their associated virtual servers and operating systems. During the discovery process, Flex System Manager identifies systems (VMControl farms) that are running VMware vCenter Server as a platform manager. Figure 4 shows the hardware PFA events management process for a system in which vCenter Server is the platform manager for multiple VMware ESXi hosts:

Figure 4. Hardware PFA events management
Illustration of hardware PFA events management flow

Figure 4 shows ESXi hosts managed by vCenter Server, which is in turn managed by Flex System Manager. In the event of a hardware failure, the integrated management module (IMM) of an ESXi host sends an event to Flex System Manager. In accordance with the preconfigured event automation plan, the corresponding action is executed.

The minimum setup steps to configure a resilience policy to automate live relocation of the virtual servers in the event of a hardware PFA are:

  1. Create a DataCenter on vCenter Server.
  2. In the DataCenter, create a Highly Available (Enabled and Capable) and DRS-enabled cluster. You must disable the setting that allows virtual-machine power-on operations that violate availability constraints.
  3. Add ESXi hosts to the cluster. Ensure that all the ESXi hosts have shared storage for creating the virtual servers.
  4. Perform a collect inventory on vCenter Server from Flex System Manager. The inventory will discover both DataCenter and the cluster as a farm in IBM Systems Director.
  5. Enable VMware vSphere vMotion for all the hosts that were added to the DRS-enabled cluster.

You can use the event automation wizard to configure the event automation plan, as shown in Figure 5:

Figure 5. Wizard pages for configuring an event automation plan that implements a resilience policy
Screen capture from the Event Automation Plan wizard showing configuration of a resilience policy

Click to see larger image

Figure 5. Wizard pages for configuring an event automation plan that implements a resilience policy

Screen capture from the Event Automation Plan wizard showing configuration of a resilience policy

In Figure 5, the administrator has chosen Hardware Predictive Failure Alert as an event filter and selected Enter Maintenance Mode as an event action.

Testing the resilience policy

You can test the resilience policy by simulating an event with the genevent command. genevent is the in-built utility that simulates the artificial events. For example, the command shown in this console session creates a simulated hardware PFA (where 13466 is the identifier of the host for the simulated hardware event):

[root@z1-9-5-124-234 ~]# smcli genevent /text:"Advance_Hardware_Predictive_Failure" /compcat:
"ManagedElement.ManagedSystemElement.PhysicalElement.PhysicalComponent.Chip.PhysicalMemory" 
/comptype:"" /sev:1 /condtype:"OperationalCondition" /condvalue:"PFA" /MEID:13466

The ESXi host then automatically goes into maintenance mode, as shown in Figure 6:

Figure 6. Flex System Manager showing the host in maintenance mode
Flex System Manager showing the host in maintenance mode

Scenario 2: Configuring the email service to send issue details to the administrator as an event action

The automation manager can be configured to send a detailed email notification to the administrator when an event occurs. Getting a notification as soon as a problem happens helps you quickly decide how to address the issue.

In Figure 7, the administrator configures CPU utilization event as an email event action:

Figure 7. Email Event Action wizard page
Screen capture of the Email Event Action page in the Event Automation Plan wizard

The screen capture in the background of Figure 7 shows the Email Event Action page in the Event Automation Plan wizard. This page takes inputs such as email ID, SMTP address, and email port and configures the event action. The action notifies the administrator by email if CPU utilization reaches the threshold level. The screen capture in the foreground of Figure 7 shows the email notification that the administrator receives after testing the action by simulating the problem condition.


Reusing an event automation plan

You can export an existing event automation plan to a format that Flex System Manager supports, such as XML. Later — instead of using the wizard to create a new event automation plan — you can create one by importing the exported file. You can generate the XML file on one Flex System Manager machine and use the same XML to create an automation plan on another Flex System Manager machine. Optionally, you can edit the XML to modify the plan-creation steps programmatically.

The console session in Listing 1 shows how to get an existing event automation plan as an XML file from the Flex System Manager environment:

Listing 1. Exporting an event automation plan to an XML file
[root@z1-9-5-124-234 tmp]# smcli mkevtautopln /tmp/test1.xml
Warning Number: 1
DNZEAP2059W: (Run-time warning) The event filter named 
'Hardware Predictive Failure Alert events' has the same name 
and definition as an existing filter in the system. The filter 
will be not be created again.
Filter name: Hardware Predictive Failure Alert events
Total number of warnings: 1
DNZEAP2064I: (Informational) Created event action 'Test12'.
DNZEAP2066I: (Informational) Created event automation plan 'test2'.
       DNZEAP2067I: (Informational) Targets 'IBM 8853L6U KQBGTT4', applied to event
       automation plan 'test2'.

The XML file in Listing 2 configures an event action that causes the resource to enter into to maintenance mode:

Listing 2. XML file (Actions.xml) that creates an event action
<?xml version="1.0" encoding="UTF-8"?>
<EventActions
    xmlns="http://www.ibm.com/director/automation/event/action/6.3"
    xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xsi:schemaLocation=
    "http://www.ibm.com/director/automation/event/action/6.3 EventAction6.3.xsd">
    <EventAction id="38">
        <Name>EventAction</Name>
        <Description/>
        <StartTaskOnSystemThatGeneratesEventActionType 
        className="com.tivoli.twg.alertmgr.TaskLauncher">
            <StartTaskOnSystemThatGeneratesEventParameters>
            <Task>Enter Maintenance Mode</Task>
            <TaskID>com.ibm.director.vsm.services.tasks.VSMEnterMaintenanceMode</TaskID>
            </StartTaskOnSystemThatGeneratesEventParameters>
        </StartTaskOnSystemThatGeneratesEventActionType>
        <History>Not saved</History>
    </EventAction>
</EventActions>

AutomateHWEvents.sh is a sample shell script that automates event automation plan creation. The script uses the Actions.xml file from Listing 2 to create a new event automation plan. (See Download to get the script and Actions.xml.) You must provide the event automation plan name, description, event filter, and target name as inputs to the script.

The console session in Listing 3 shows the AutomateHWEvents.sh script being run in a Flex System Manager environment:

Listing 3. Running the AutomateHWEvents.sh script
[root@z3-9-5-127-172 piyush]# ./AutomateHWEvents.sh
WELCOME TO CREATE EVENT AUTOMATION WIZARD
=========================================
=============
| PLAN NAME |
=============
Please enter event automation plan name: EAP1

Checking Event Automation Plan with name EAP1, already exist or not...

Event Automation Plan with name EAP1 exists, so removing...
====================
| PLAN DESCRIPTION |
====================
Please enter event description for event automation plan: plan
================
| EVENT ACTION |
================
Checking Event Action with name 'EventAction', already exist or not...

Event Action with name 'EventAction' exists, so removing...

Creating a new Event Action with name 'EventAction'...

DNZEAP2064I: (Informational) Created event action 'EventAction'.
Event Action with name 'EventAction' is created successfully.

=================
| EVENT FILTERS |
=================
All Events, 0x10
Audit Events, 0x24
Common Agent offline, 0x1e
Critical Events, 0x12
Disk use, 0x17
Electronic Service Requests, 0x18
Electronic Service and Support Events, 0x1c
Environmental sensor events, 0x19
Fatal Events, 0x11
Hardware Predictive Failure Alert events, 0x22
Informational Events, 0x15
Management server security events, 0x1d
Memory use, 0x23
Minor Events, 0x13
Physical hardware security events, 0x20
Processor use, 0x1b
Service and Support Manager processing error events, 0x1f
Service and Support Manager serviceable events, 0x21
Storage events, 0x1a
Unknown Events, 0x16
Warning Events, 0x14

Please enter event filter name or oid from the above list of filters:
Hardware Predictive Failure Alert events

==================
| TARGET SYSTEMS |
==================
IBM 8233E8B 065836R 46
z3-9-5-127-172.rch.nimbus.kstart.ibm.com

Please enter the target systems, separated by a comma from the above list of systems:
z3-9-5-127-172.rch.nimbus.kstart.ibm.com

==================================
| EVENT AUTOMATION PLAN CREATION |
==================================
Creating a Event Automation Plan...

Executing the following command to create a Event Automation Plan:
/opt/ibm/director/bin/smcli mkevtautopln -e "Hardware Predictive Failure Alert events" -x 
"EventAction" -i "z3-9-5-127-172.rch.nimbus.kstart.ibm.com" -D "plan" "EAP1"

Event Automation Plan with name EAP1 is created successfully.
Details for new Event Automation Plan are given below:

Name: EAP1
Description: plan
Status: Active
Event Filter: Hardware Predictive Failure Alert events
Time Ranges:
    All the time (24x7)
Actions:
    EventAction
Targets:
    System Name: z3-9-5-127-172.rch.nimbus.kstart.ibm.com
    System Name: IBM 8233E8B 065836R 46
===========================================================

Administrators can program various parts of an event automation plan as needed — for example, to configure it for different systems or event filters in the Flex System Manager environment. You can also provide different event actions by modifying the original (exported) XML file.


Conclusion

The two scenarios presented in this article are merely illustrative. The Flex System Manager automation manager provides a powerful generic infrastructure for automating ways to monitor and respond to events and alerts in IBM PureFlex system.


Download

DescriptionNameSize
Sample script and XML filescripts.zip1.47KB

Resources

Learn

Discuss

  • Get involved in the developerWorks community. Connect with other developerWorks users while exploring the developer-driven blogs, forums, groups, and wikis.

Comments

developerWorks: Sign in

Required fields are indicated with an asterisk (*).


Need an IBM ID?
Forgot your IBM ID?


Forgot your password?
Change your password

By clicking Submit, you agree to the developerWorks terms of use.

 


The first time you sign into developerWorks, a profile is created for you. Information in your profile (your name, country/region, and company name) is displayed to the public and will accompany any content you post, unless you opt to hide your company name. You may update your IBM account at any time.

All information submitted is secure.

Choose your display name



The first time you sign in to developerWorks, a profile is created for you, so you need to choose a display name. Your display name accompanies the content you post on developerWorks.

Please choose a display name between 3-31 characters. Your display name must be unique in the developerWorks community and should not be your email address for privacy reasons.

Required fields are indicated with an asterisk (*).

(Must be between 3 – 31 characters.)

By clicking Submit, you agree to the developerWorks terms of use.

 


All information submitted is secure.

Dig deeper into Cloud computing on developerWorks


  • Bluemix Developers Community

    Get samples, articles, product docs, and community resources to help build, deploy, and manage your cloud apps.

  • developerWorks Labs

    Experiment with new directions in software development.

  • DevOps Services

    Software development in the cloud. Register today to create a project.

  • Try SoftLayer Cloud

    Deploy public cloud instances in as few as 5 minutes. Try the SoftLayer public cloud instance for one month.

static.content.url=http://www.ibm.com/developerworks/js/artrating/
SITE_ID=1
Zone=Cloud computing
ArticleID=947008
ArticleTitle=Automate cloud event management
publish-date=10022013