IBM Support

Managing Unplanned Downtime with IBM PowerHA SystemMirror SE 7.2

Technical Blog Post


Abstract

Managing Unplanned Downtime with IBM PowerHA SystemMirror SE 7.2

Body

image

 

IBM® PowerHA® technology provides high availability, business continuity and disaster recovery, with near-continuous application availability with advanced failure detection, failover and recovery capabilities, which offers robust performance along with a simplified user interface for managing, monitoring, and configuring multi-node clusters.

 

Table of Content

  1. Start

  2. Stop

  3. Monitor

  4. Install & Configure

  5. Prerequisites

  6. References

Start

To start the cluster and all resources that are managed by the cluster are brought online:

  # clmgr start cluster

Stop

To stop the cluster on all nodes

  # clmgr offline cluster

To stop the cluster but let the services continue running:

  # clmgr offline cluster MANAGE=unmanage

Monitor

To check the status of a cluster:

  # clmgr query cluster | grep STATE  # clmgr -a STATE query cluster

To check the status of resource groups:

  # clmgr query resource_group  # clRGinfo -v

To check the status of the cluster manager:

  # lssrc -ls clstrmgrES

To check the configuration:

  # clmgr query <cluster CLASS object to query>  # cllsif  # cllscf  # cllsdisk -g <Resource Group>  # cllsvg -g <Resource Group>  # cllsserv  # cllsres  # cllsgrp  # cllsfs  # clshowres  # clshowsrv  # cltopinfo

Install & Configure

To create a basic dual node PowerHA SystemMirror v7.2.2 Standard Edition cluster, with single resource group with one application controller, one service IP and one shared volume group of SAN LUN hdisks:

  1. (optional) Install the PowerHA SystemMirror GUI server.
  2. Configure the AIX LPAR cluster nodes.
  3. Zone and mask the cluster shared SAN LUNs to all WWPNs for all the nodes.
  4. Create the cluster.
  5. Add nodes.
  6. Add repository disk(s).
  7. Add service IP.
  8. Add application controller.
  9. Add resource group.
  10. Synchronize cluster.
  11. Test and verify cluster functionality.

Install the PowerHA SystemMirror GUI server

With PowerHA SystemMirror v7.2.2 Standard Edition it is recommend to install the GUI server on a separate small AIX LPAR.

If the AIX LPAR do not have Internet access, download and put the per-requisite RPMs in /var/hacmp/log/smuiinst.downloads/ directory (below is for PowerHA SystemMirror 7.2.2, check the script for future levels):

  http://www.bullfreeware.com/download/bin/1255/info-4.13-3.aix5.3.ppc.rpm  http://www.bullfreeware.com/download/bin/2191/cpio-2.11-2.aix6.1.ppc.rpm  http://www.bullfreeware.com/download/bin/1267/readline-6.2-2.aix5.3.ppc.rpm  http://www.bullfreeware.com/download/bin/1260/libiconv-1.13.1-2.aix5.3.ppc.rpm  http://www.bullfreeware.com/download/bin/1231/bash-4.2-5.aix5.3.ppc.rpm  http://www.bullfreeware.com/download/bin/1250/gettext-0.17-6.aix5.3.ppc.rpm  http://www.bullfreeware.com/download/bin/2287/libgcc-4.9.2-1.aix6.1.ppc.rpm  http://www.bullfreeware.com/download/bin/2295/libgcc-4.9.2-1.aix7.1.ppc.rpm  http://www.bullfreeware.com/download/bin/2289/libstdc++-4.9.2-1.aix6.1.ppc.rpm  http://www.bullfreeware.com/download/bin/2297/libstdc++-4.9.2-1.aix7.1.ppc.rpm

Install the RPMs:

  # /usr/es/sbin/cluster/ui/server/bin/smuiinst.ksh -i

The options are:

-D // for debug
-d // only do the download
-i // only do the install
-f // force
-p <proxy url> // proxy
-P <secure proxy url> // secure proxy
-h // help
-v // verbose

Install the GUI server:

  # loopmount -i POWERHA_SYSTEMMIRROR_V7.2.2_SE.iso -o "-V cdrfs -o ro" -m /mnt  # installp -gYXd /mnt/smui_server/cluster.es.smui.server_722 cluster.es.smui.server cluster.es.smui.common

Login to the GUI server with a web browser

  https://HostName:8080/#/login

Configure the AIX LPAR cluster nodes

Add all cluster node IP addresses and symbolic names to the /etc/hosts file

Add all cluster node boot IP addresses (or symbolic names from /etc/hosts file) to:

  • /usr/es/sbin/cluster/etc/rhosts
  • /etc/cluster/rhosts

Add filesystems to NFS export, for both cluster nodes or non-cluster servers, to:

  • /usr/es/sbin/cluster/etc/exports

Such as:

  • /userdata -sec=sys,rw,access=lpar99,root=lpar99 // if root access

Enable CRITICAL VG on each LPARs rootvg:

  # chvg -r y rootvg

Enable poll_uplink on each PowerVM VIOS backed network interface device:

  # chdev -a poll_uplink=yes -l ent0 -P

Clear PV information from the repository disk, on one of the cluster nodes LPARs:

  # chdev -a pv=yes -l hdisk33

Check the AIX level:

  # oslevel -s

Check and install per-requsites for PowerHA SystemMirror:

  • bos.cluster.rte
  • rsct.compat.basic.hacmp
  • rsct.compat.clients.hacmp

Install PowerHA SystemMirror:

  # loopmount -i POWERHA_SYSTEMMIRROR_V7.2.2_SE.iso -o "-V cdrfs -o ro" -m /mnt  # installp -gYXd /mnt/installp/ppc cluster.es.server // and additional filesets as required, such as the GUI Agent cluster.es.smui.common, or for cluster NFS also cluster.es.nfs.rte

Install any PowerHA SystemMirror fixes and check level:

# halevel -s

Zone and mask the cluster shared SAN LUNs to all WWPNs for all the nodes

  • IF using NPIV, zone and mask to each LPARs NPIV adapters both WWPNs, for all cluster nodes on all Managed Systems
  • IF using VSCSI, zone and mask to each VIOS physical FC adapters WWPN, for all VIOS on all Managed Systems

Check that all the shared LUN hdisks devices are available on all cluster nodes (compare UUID):

  # lspv -u

Check that all the shared LUN hdisks devices have no_reserve set on all cluster nodes:

  # lsdev -Cc disk -Fname|xargs -i lsattr -Pl {} -a reservation_policy // check to ensure no_reserve is set on shared cluster disks  # lsdev -Cc disk -Fname|xargs -i chpv -Pl {} -a reservation_policy=no_reserve // change to no_reserve on shared cluster disks for next boot/load  

Check that all the hdisks have no_reserve set on all cluster nodes:

  # lsdev -Cc disk -Fname|xargs -i devrsrv -c query -l {} // check current locking on shared cluster disks, ensure they are not locked

Create the cluster

Ensure the PATH environment variable is set and exported, such as: export PATH=$PATH:/usr/es/sbin/cluster/utilities

  # clmgr add cluster CL1 nodes=LPAR1 HEARTBEAT_TYPE=unicast

Add nodes

  # clmgr add node LPAR2

Add repository disk(s)

  # clmgr add repository hdisk33

Add service IP

Ensure the IP and hostname is available in /etc/hosts on all cluster nodes:

  # clmgr add service_ip CL1SRVIP1

Add application controller

Ensure the the application start, stop (and monitor) scripts are available on all cluster nodes, and have execute permission (chmod +x <script>):

  # clmgr add application_controller CL1AC1 startscript=/usr/local/pbin/startAC1.sh stopscript=/usr/local/pbin/stopAC1.sh

IF not using IBM supported SmartAssist scripts, ensure the custom scripts exit in a controlled way and that each execution is checked for success, if the custom scripts exit with non-zero exit code manual intervention is required to clear the config_too_long issue this can result in, such as with clruncmd LPAR1 (if the issue occurred on cluster node LPAR1).

Add resource group

Create the resource group and connect with the service IP and application controller:

  # clmgr add rg CL1RG1 nodes=LPAR1,LPAR2 service_label=CL1SRVIP1 application=CL1AC1

Creating a resource group including a volume group:

  # clmgr add rg CL1RG1 nodes=LPAR1,LPAR2 service_label=CL1SRVIP1 volume_group=cl1vg1 application=CL1AC1

Synchronize cluster

Synchronize the cluster and allow fixing any configuration issues on all nodes:

  # clmgr sync cluster FIX=YES

Create a configuration snapshot

  # clmgr add snapshot CL1$(date +"%Y%m%d")  # clsnapshotinfo

Test and verify cluster functionality

Start by moving the resource group between the cluster nodes, and check that the service IP is assigned properly, that the application is started, that all volume groups with filesystems and any NFS exports or NFS mounts are properly in place after each move, and  that the hacmp.out logfile and other logs do not show any error conditions.

  # for i in 2 1 2 1 2 1 2 1 2;do clRGmove -g CL1RG1 -n LPAR$i; for j in 1 2 3 4 5 6 7 8 9;do clRGinfo; sleep 45; done; done // 45s delay between moves

Or with the clmgr command:

  # for i in 2 1 2 1 2 1 2 1 2;do clmgr move resource_group CL1RG1 node=LPAR$i; for j in 1 2 3 4 5 6 7 8 9;do clmgr query resource_group CL1RG1; sleep 45; done; done

Prerequisites

Review the PowerHA SystemMirror for AIX Version Compatibility Matrix TECHDOC at

PowerHA SystemMirror 7.2.2 (5765-H39) minimum AIX levels:

  • AIX 7100-04 and 7100-05
  • AIX 7200-00, 7200-01 and 7200-02

Review the  PowerHA SystemMirror Known Fixes at

Such as for AIX AIX 7200-02 APAR IJ04268 http://www-01.ibm.com/support/docview.wss?uid=isg1IJ04268

References

clmgr command

clmgr command: Quick reference

PowerHA SystemMirror commands

IBM PowerHA SystemMirror Version 7.2.2 for AIX documentation

PowerHA SystemMirror graphical user interface (GUI)

https://www.ibm.com/support/knowledgecenter/en/SSPHQG_7.2.2/com.ibm.powerha.gui/ha_gui_kickoff.htm

Installing PowerHA SystemMirror GUI

PowerHA SystemMirror systems monitoring and recovery - Video

PowerHA SystemMirror Cloud Enabled Administration - Video

PowerHA Release Notes PowerHA SystemMirror 7.2.2

IBM Support PowerHA SystemMirror new support experience

PowerHA SystemMirror FLRT Lite

PowerHA SystemMirror for AIX Version Compatibility Matrix TECHDOC

PowerHA SystemMirror Known Fixes Information

Fix Level Recommendation Tool

PowerHA SystemMirror Technology level update images at Entitled Systems Support

5765-H39 = "PowerHA for AIX Standard Edition", feature 2322

5765-H37="PowerHA SystemMirror Enterprise Edition", feature 2323

PowerHA/CAA Tunable Guide

PowerHA SystemMirror Forums

LinkedIn:

DeveloperWorks:

QA Forum:

AIX Vulnerability Checker:

AIX HIPER APARs:

 

 

[{"Business Unit":{"code":"BU054","label":"Systems w\/TPS"},"Product":{"code":"HW206","label":"Storage Systems"},"Component":"","Platform":[{"code":"PF025","label":"Platform Independent"}],"Version":"","Edition":"","Line of Business":{"code":"LOB26","label":"Storage"}}]

UID

ibm16165123