In a system-management environment, an event is an occurrence of significance to a task or system. An event can be further classified as either an alert (the occurrence of a problem) or a resolution (the occurrence of a solution to a problem). Depending on the problem's relevance and significance, action might be required to maintain optimal use of data center resources. Either an administrator must be notified to take the necessary steps, or the administrator can choose to predefine the action(s) to perform.
When action is needed in an IBM PureFlex System environment, the automation manager infrastructure in IBM Flex System Manager facilitates resolution by notifying the administrator or by initiating a predefined task. IBM Flex System Manager is the management solution for the preconfigured, pre-integrated IBM PureFlex System infrastructure. By automating repetitive tasks, Flex System Manager reduces the number of manual steps needed for typical management tasks. This article introduces the automation manager function and explains how to use it to automate responses to common data center events. With the help of two example scenarios, learn how to create an event automation plan that uses the automation manager to perform predefined tasks in response to specific alerts.
Event automation plans and how they work
The administrator of an IBM PureFlex System environment can create an event automation plan and apply it to a specific system or a group of systems. The event automation plan can automate tasks and common cloud operations in response to situations that occur in the managed environment.
A Flex System Manager event automation plan has three basic components:
- Target list: Specifies one or more systems or groups of systems in the managed environment.
- Filter: Specifies the types of events to process.
- Action: Specifies the response to an event. You can specify one or more actions to start when the event occurs.
Figure 1 shows an event automation plan that an administrator is in the process of creating with the Event Automation Plan wizard that's available from the Flex System Manager console:
Figure 1. Flex System Manager's default support for event filters and event actions
By default, the administrator can choose from all available events, as shown in the wizard's Event Filters screen (background of Figure 1). The Create Action screen (foreground of Figure 1) shows the list of common actions that are available to select for an event. After an event automation plan is in place, the action(s) that are specified for that event execute whenever the event occurs.
Figure 2 shows an overview of event automation plan setup and execution:
Figure 2. Overview of event automation plan setup and execution
In detail, the steps for the sequence that's shown in Figure 2 are:
- The administrator creates an event automation plan and applies it to a specific system or group of systems (such as a server farm).
- In the event automation plan, the administrator uses a filter to specify which types of events to process and specifies an event action (such as sending an email notification) for each of those events.
- An event occurs in the system that's associated with the event automation plan.
- If the event automation plan includes a filter that specifies the type of event that just occurred, the plan sees if any action is specified for that filter.
- If an action (such as sending an email notification) is specified, the action is performed.
Now that you have some background about the automation manager and the event automation plan functionality in Flex System Manager, let's see how event automation plans can help you to automate actions in two common data center scenarios.
Scenario 1: Configuring a resilience policy for a VMware cluster
IBM PureFlex System has the intelligence to raise predictive failure alerts (PFAs). A hardware PFA is an event that occurs when a service processor or management module detects that something hardware-related is going to fail; it is a type of prefailure alert. For example, when a dual in-line memory module (DIMM) in a system is about to fail, a hardware PFA occurs.
Figure 3 shows common hardware PFA events that Flex System Manager can respond to:
Figure 3. Common hardware PFA event types
Enabling a high-availability server farm
In a system-management environment, resilience is the capability of a resource (such as a server, network, or storage device) to recover quickly and continue operating after a failure or disruption occurs. (Another term for this capability is high availability.) If you enable resilience policies in Flex System Manager for hardware PFAs that originate from VMware server clusters (also called VMControl farms), you don't need to know which specific events to monitor. Instead, you create and execute an event automation plan to trigger the VMware maintenance mode functionality. Then, when an alert occurs, the plan automatically relocates all virtual servers on the source host to suitable target hosts in the same cluster. Because hardware issues such as fan or disk failures can be identified early, virtual-server downtime can be avoided.
In PureFlex System, a platform manager manages one or more host systems and their associated virtual servers and operating systems. During the discovery process, Flex System Manager identifies systems (VMControl farms) that are running VMware vCenter Server as a platform manager. Figure 4 shows the hardware PFA events management process for a system in which vCenter Server is the platform manager for multiple VMware ESXi hosts:
Figure 4. Hardware PFA events management
Figure 4 shows ESXi hosts managed by vCenter Server, which is in turn managed by Flex System Manager. In the event of a hardware failure, the integrated management module (IMM) of an ESXi host sends an event to Flex System Manager. In accordance with the preconfigured event automation plan, the corresponding action is executed.
The minimum setup steps to configure a resilience policy to automate live relocation of the virtual servers in the event of a hardware PFA are:
- Create a DataCenter on vCenter Server.
- In the DataCenter, create a Highly Available (Enabled and Capable) and DRS-enabled cluster. You must disable the setting that allows virtual-machine power-on operations that violate availability constraints.
- Add ESXi hosts to the cluster. Ensure that all the ESXi hosts have shared storage for creating the virtual servers.
- Perform a collect inventory on vCenter Server from Flex System Manager. The inventory will discover both DataCenter and the cluster as a farm in IBM Systems Director.
- Enable VMware vSphere vMotion for all the hosts that were added to the DRS-enabled cluster.
You can use the event automation wizard to configure the event automation plan, as shown in Figure 5:
Figure 5. Wizard pages for configuring an event automation plan that implements a resilience policy
In Figure 5, the administrator has chosen Hardware Predictive Failure Alert as an event filter and selected Enter Maintenance Mode as an event action.
Testing the resilience policy
You can test the resilience policy by simulating an event with the
genevent is the in-built utility that simulates the artificial events. For example, the command shown in this console session creates a simulated hardware PFA (where
13466 is the identifier of the host for the simulated hardware event):
[root@z1-9-5-124-234 ~]# smcli genevent /text:"Advance_Hardware_Predictive_Failure" /compcat: "ManagedElement.ManagedSystemElement.PhysicalElement.PhysicalComponent.Chip.PhysicalMemory" /comptype:"" /sev:1 /condtype:"OperationalCondition" /condvalue:"PFA" /MEID:13466
The ESXi host then automatically goes into maintenance mode, as shown in Figure 6:
Figure 6. Flex System Manager showing the host in maintenance mode
Scenario 2: Configuring the email service to send issue details to the administrator as an event action
The automation manager can be configured to send a detailed email notification to the administrator when an event occurs. Getting a notification as soon as a problem happens helps you quickly decide how to address the issue.
In Figure 7, the administrator configures CPU utilization event as an email event action:
Figure 7. Email Event Action wizard page
The screen capture in the background of Figure 7 shows the Email Event Action page in the Event Automation Plan wizard. This page takes inputs such as email ID, SMTP address, and email port and configures the event action. The action notifies the administrator by email if CPU utilization reaches the threshold level. The screen capture in the foreground of Figure 7 shows the email notification that the administrator receives after testing the action by simulating the problem condition.
Reusing an event automation plan
You can export an existing event automation plan to a format that Flex System Manager supports, such as XML. Later — instead of using the wizard to create a new event automation plan — you can create one by importing the exported file. You can generate the XML file on one Flex System Manager machine and use the same XML to create an automation plan on another Flex System Manager machine. Optionally, you can edit the XML to modify the plan-creation steps programmatically.
The console session in Listing 1 shows how to get an existing event automation plan as an XML file from the Flex System Manager environment:
Listing 1. Exporting an event automation plan to an XML file
[root@z1-9-5-124-234 tmp]# smcli mkevtautopln /tmp/test1.xml Warning Number: 1 DNZEAP2059W: (Run-time warning) The event filter named 'Hardware Predictive Failure Alert events' has the same name and definition as an existing filter in the system. The filter will be not be created again. Filter name: Hardware Predictive Failure Alert events Total number of warnings: 1 DNZEAP2064I: (Informational) Created event action 'Test12'. DNZEAP2066I: (Informational) Created event automation plan 'test2'. DNZEAP2067I: (Informational) Targets 'IBM 8853L6U KQBGTT4', applied to event automation plan 'test2'.
The XML file in Listing 2 configures an event action that causes the resource to enter into to maintenance mode:
Listing 2. XML file (Actions.xml) that creates an event action
<?xml version="1.0" encoding="UTF-8"?> <EventActions xmlns="http://www.ibm.com/director/automation/event/action/6.3" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xsi:schemaLocation= "http://www.ibm.com/director/automation/event/action/6.3 EventAction6.3.xsd"> <EventAction id="38"> <Name>EventAction</Name> <Description/> <StartTaskOnSystemThatGeneratesEventActionType className="com.tivoli.twg.alertmgr.TaskLauncher"> <StartTaskOnSystemThatGeneratesEventParameters> <Task>Enter Maintenance Mode</Task> <TaskID>com.ibm.director.vsm.services.tasks.VSMEnterMaintenanceMode</TaskID> </StartTaskOnSystemThatGeneratesEventParameters> </StartTaskOnSystemThatGeneratesEventActionType> <History>Not saved</History> </EventAction> </EventActions>
AutomateHWEvents.sh is a sample shell script that automates event automation plan creation. The script uses the Actions.xml file from Listing 2 to create a new event automation plan. (See Download to get the script and Actions.xml.) You must provide the event automation plan name, description, event filter, and target name as inputs to the script.
The console session in Listing 3 shows the AutomateHWEvents.sh script being run in a Flex System Manager environment:
Listing 3. Running the AutomateHWEvents.sh script
[root@z3-9-5-127-172 piyush]# ./AutomateHWEvents.sh WELCOME TO CREATE EVENT AUTOMATION WIZARD ========================================= ============= | PLAN NAME | ============= Please enter event automation plan name: EAP1 Checking Event Automation Plan with name EAP1, already exist or not... Event Automation Plan with name EAP1 exists, so removing... ==================== | PLAN DESCRIPTION | ==================== Please enter event description for event automation plan: plan ================ | EVENT ACTION | ================ Checking Event Action with name 'EventAction', already exist or not... Event Action with name 'EventAction' exists, so removing... Creating a new Event Action with name 'EventAction'... DNZEAP2064I: (Informational) Created event action 'EventAction'. Event Action with name 'EventAction' is created successfully. ================= | EVENT FILTERS | ================= All Events, 0x10 Audit Events, 0x24 Common Agent offline, 0x1e Critical Events, 0x12 Disk use, 0x17 Electronic Service Requests, 0x18 Electronic Service and Support Events, 0x1c Environmental sensor events, 0x19 Fatal Events, 0x11 Hardware Predictive Failure Alert events, 0x22 Informational Events, 0x15 Management server security events, 0x1d Memory use, 0x23 Minor Events, 0x13 Physical hardware security events, 0x20 Processor use, 0x1b Service and Support Manager processing error events, 0x1f Service and Support Manager serviceable events, 0x21 Storage events, 0x1a Unknown Events, 0x16 Warning Events, 0x14 Please enter event filter name or oid from the above list of filters: Hardware Predictive Failure Alert events ================== | TARGET SYSTEMS | ================== IBM 8233E8B 065836R 46 z3-9-5-127-172.rch.nimbus.kstart.ibm.com Please enter the target systems, separated by a comma from the above list of systems: z3-9-5-127-172.rch.nimbus.kstart.ibm.com ================================== | EVENT AUTOMATION PLAN CREATION | ================================== Creating a Event Automation Plan... Executing the following command to create a Event Automation Plan: /opt/ibm/director/bin/smcli mkevtautopln -e "Hardware Predictive Failure Alert events" -x "EventAction" -i "z3-9-5-127-172.rch.nimbus.kstart.ibm.com" -D "plan" "EAP1" Event Automation Plan with name EAP1 is created successfully. Details for new Event Automation Plan are given below: Name: EAP1 Description: plan Status: Active Event Filter: Hardware Predictive Failure Alert events Time Ranges: All the time (24x7) Actions: EventAction Targets: System Name: z3-9-5-127-172.rch.nimbus.kstart.ibm.com System Name: IBM 8233E8B 065836R 46 ===========================================================
Administrators can program various parts of an event automation plan as needed — for example, to configure it for different systems or event filters in the Flex System Manager environment. You can also provide different event actions by modifying the original (exported) XML file.
The two scenarios presented in this article are merely illustrative. The Flex System Manager automation manager provides a powerful generic infrastructure for automating ways to monitor and respond to events and alerts in IBM PureFlex system.
|Sample script and XML file||scripts.zip||1.47KB|
- IBM Flex System Information Center: Consult the Flex System online documentation to learn more about automating tasks in your system-management environment.
- Find IBM PureFlex System resources on developerWorks.
- Learn more about IBM PureFlex System in this overview.
- "Automate your virtual cloud appliance onto IBM PureFlex System" (Jarek Miszczyk, developerWorks, April 2012): Take a tour of the IBM Virtual Appliance Factory toolkit.
- Explore developerWorks Cloud computing, where you will find valuable community discussions and learn about new technical resources related to the cloud.
- Get involved in the developerWorks community. Connect with other developerWorks users while exploring the developer-driven blogs, forums, groups, and wikis.
Dig deeper into Cloud computing on developerWorks
Exclusive tools to build your next great app. Learn more.
Crazy about Cloud? Sign up for our monthly newsletter and the latest cloud news.
Deploy public cloud instances in as few as 5 minutes. Try the SoftLayer public cloud instance for one month.