IBM PowerHA SystemMirror rapid deploy cluster worksheets for IBM AIX

Quick deployment guide on version 7 clusters

This tutorial outlines the high-level requirements to plan out a cluster configuration with a checklist approach and then proceeds with the deployment instructions through the new cluster command-line interface (CLI). The steps outlined are for illustration purposes and can be easily modified to deploy more-complex configurations. Note that beyond these instructions, there are alternative paths using the smitty sysmirror and IBM Systems Director interface, which are well documented in IBM® Redbooks® and online documentation. The CLI examples shown were primarily tested using the 7.1.2 cluster release and show options and flags that might not have been retrofitted in the earlier version 7 releases.

Michael Herrera (mherrera@us.ibm.com), Certified IT Specialist, IBM

Image of Michael HerreraMichael Herrera has been working with IBM HACMP ™ and the PowerHA SystemMirror cluster software on AIX for over 14 years. He has held roles in the AIX Software Support Center supporting PowerHA and SAN solutions and has published a number of Redbooks and technical papers. He is currently part of the Advanced Technical Sales Support organization and is responsible for the effective deployment of high availability and disaster recovery solutions in the field. He is based out of Dallas, TX and can be reached at mherrera@us.ibm.com.



23 October 2013

Introduction

The IBM PowerHA® SystemMirror V7 cluster software is the next evolution of IBM AIX® clustering. In an effort to provide tighter integration with the AIX operating system, a new kernel level layer called Cluster Aware AIX (CAA) was developed. The cluster software leverages this new foundation for its heartbeat and message communication. Running at the kernel level ensures that the cluster communication receives top priority and is not affected in the event of a memory leak or rogue application consuming the system's resources. This redesign enables health monitoring across all of the network interfaces along with the ability to react to the loss of root volume group (rootvg) when booting from an external storage area network (SAN) storage enclosure. In addition, new target mode capabilities in Fibre Channel (FC) adapters allow for a new storage framework communication for health monitoring over the SAN. The following instructions are intended to assist in the rapid deployment of a PowerHA SystemMirror V7 cluster leveraging the new clmgr CLI. They also provide examples of common administrative tasks, sample configuration files, and useful logs.

Minimum prerequisites
PowerHA SystemMirror releaseGenerally availableMinimum AIX level
PowerHA SM Version 7.1.0 September 2010 AIX 7.1 with RSCT 3.1.0.1
AIX 6.1 TL6 SP1 with RSCT 3.1.0.1
PowerHA SM Version 7.1.1 December 2011 AIX 7.1 TL1 SP3 with RSCT 3.1.2.0
AIX 6.1 TL7 SP3 with RSCT 3.1.2.0
PowerHA SM Version 7.1.2 November 2012 AIX 7.1 TL2 SP1 with RSCT 3.1.2.0
AIX 6.1 TL8 SP3 with RSCT 3.1.2.0
PowerHA SM Version 7.1.3 December 2013 AIX 7.1 TL3 SP1 with RSCT 3.1.5.0
AIX 6.1 TL9 SP1 with RSCT 3.1.5.0
  • The packages required for CAA functionality include:
    • bos.cluster.rte
    • bos.ahafs
    • bos.cluster.solid (No longer required beyond HA 7.1.0)
  • All shared volume groups (VGs) in a version 7 cluster must be Enterprise Concurrent Mode (ECM) VGs:
    • bos.clvm.enh

Cluster resource checklist

  • IP address planning
    • Request IPs (number of boot/base, persistent, and service IPs).
    • Register Domain Name Server (DNS) names.
    • Update configuration files: /etc/hosts /etc/cluster/rhosts.
    • Hard set IPs on interfaces.
  • Shared storage planning
    • Determine space requirements [number of data logical unit numbers (LUNs) and cluster repository disk]
    • Identify driver and multipath requirements
    • Define LUN mappings
    • Create SAN zones
    • Create or import the shared volume group, logical volume, and file system information
    • Use unique names for resources imported across cluster members
  • Highly available applications planning
    • Identify the installation location and space requirements.
    • Identify user and permission settings.
    • Test and deploy application start and stop scripts.
    • Optionally, test and deploy application monitoring scripts.
  • PowerHA SystemMirror cluster deployment:
    • Identify and install AIX level requirements [including CAA and Reliable Scalable Cluster Technology (RSCT) packages] on all nodes.
    • Identify and install the required PowerHA SystemMirror code level on all nodes.
    • Restart logical partitions (LPARs) to pick up kernel bos updates
  • From Node 1:
    • Define cluster name
    • Define cluster repository disk
    • Define multicast address (automatic or manual)
    • Define node names
    • Define networks
    • Define interfaces
    • Define application controllers
    • Define service IPs
    • Define resource groups
    • Define resources to resource groups
    • Verify and synchronize the cluster.
    • Start cluster services on all nodes.

After the configuration is completed and synchronized, you can proceed with the following tasks:

  • Fallover testing: Graceful stop with takeover and resource group moves (soft) compared to reboot –q (hard).
  • Monitor the environment.
  • Configuration files:
    • /etc/hosts: The contents of this file should include all of the cluster IP addresses and their corresponding IP labels as it is preferred to have the cluster resolve locally and then revert to DNS, if necessary.
    • /etc/cluster/rhosts: Populate the file on both nodes and refresh the cluster communication daemon. (refresh –s clcomd). Explicitly defined cluster IPs in each line helps in avoiding name resolution issues. Ensure that only valid, accessible cluster IPs are defined in this file.
    • /usr/es/sbin/cluster/netmon.cf: This file is used by the cluster in single adapter networks to attempt to determine the adapter status in the event of a failure. Virtualized environments should deploy this file to point to default gateways or IPs residing outside the physical frame to validate outside connectivity.
  • IP addresses:
    • Multicast address (automatically or manually assigned): Cluster heartbeat on version 7 clusters uses IP multicasting and by default, assigns a multicast address during the cluster creation process. It attempts to avoid duplication across clusters by defining an address based on the first IP that it detects on your network interfaces (for example, en0 – 9.10.10.1 base IP might result in a 228.10.10.1 multicast address). If you wish to define your own multicast address, you can also do so during that portion of the cluster configuration. This default changes back to unicast communication in Version 7.1.3, but IP multicasting will still be an available option.
    • Base IP addresses: Every adapter in AIX will typically have an IP address on it stored in the ODM and set to come online during the system boot sequence. These adapters can be defined within the cluster definitions as base / boot adapters if they are to be within a PowerHA network. Note that CAA attempts to use all interfaces within the LPAR anyway unless the administrator has explicitly defined them in a PowerHA private network. Virtual local area networks (VLANs) that have interfaces that will host a potential service IP must have IP multicasting enabled. Otherwise, CAA considers those interfaces down and never attempts to acquire the service IP alias onto them.
    • Persistent IPs: This is a cluster node specific alias that will be available on system boot whether HA services are running or not. These can be used as administrative IPs for each node or as IPs to hold the route for the routable subnet in the event of a cluster failover. For some time now, PowerHA has allowed single adapter networks to define the base/boot IP and service IPs on the same routable subnet. Therefore, the need for a persistent IP is not as prevalent as in earlier releases, and thus, these are not typically required.
    • Service IPs: Any defined service IP address aliases will be managed by the cluster as long as they are defined within a resource group. Based on where the resource group and its corresponding resources are being hosted will determine the location of the service IP aliases
  • Shared disks:
    • CAA repository disk (size requirement: minimum 512 MB and maximum 460 GB): This is a new CAA requirement that must be visible to all cluster members. The common practice is for this LUN to be defined as standard LUN size in the environment as long as it is within the minimum and maximum size requirements. At the time of the first verify and synchronize operation, the cluster creates a private volume group on the device.
    • Shared data volumes: All cluster-managed shared volume groups must be created or converted to enhanced concurrent mode and mapped and then imported onto all cluster nodes. The corresponding LUNs should be defined to have no reservations set in their backend multipath drivers. During cluster processing, the cluster manages the devices with its own disk fence registers and only allows file systems to mount on the node hosting the resource group.
  • Cluster resource group policies:
    • A resource group in the cluster configuration is a container for the different highly available resources. The different resource group startup, fallover, and fallback policies should be established during the planning phase and should be fully understood.

Resource group policies

Resource group policy Available options
Startup policy
  • Online on Home Node Only Online on First Available Node
  • Online Using Distribution Policy
  • Online on All Available Nodes
Fallover policy
  • Fallover to Next Node in the List
  • Fallover Using Dynamic Node Priority
  • Bring Offline
Fallback policy
  • Fallback to Higher Priority Node in the List
  • Bring Offline
  • Clustered applications (application controller definitions)
    • Start / Stop scripts: The application controller scripts must reside in a common path in all participating cluster members. They must also be executable by the root user. The content of the scripts does not need to match between all cluster members. However, if the contents need to match based on application requirements, the PowerHA file collection function can be used to ensure that changes are replicated automatically every 10 minutes.
    • (Optional) Application monitoring scripts: The cluster software delivers an optional application monitoring framework that can be used in any deployment. The cluster runs a clappmon process for every monitor defined on the node hosting its resource group and corresponding application controller. Any monitor script should be executable by root, be thoroughly tested, have proper script termination, and reside in a common location on all cluster members.

CAA heartbeat communication

  • Repository disk version 7 cluster communication requires the use of a shared LUN (repository disk) for heartbeating and to store cluster configuration information. The size requirement for the 7.1.1 and 7.1.2 releases are a minimum size of 512 MB and a maximum of 460 GB. It is common for clients to use their standard LUN size rather than designating a volume smaller than their current data LUNs.
  • IP interfaces: The new communication protocol used by version 7 clusters requires the enablement of IP multicasting on the layer 2 devices backing the network interfaces. CAA uses all the interfaces on the system by default unless they are defined to a highly available private network. IP network definitions are required for the cluster to perform IP address takeover between the cluster members. The cluster will not bring a service IP alias online on an interface if multicast communication is not working because the interface will be considered unavailable.
  • (Optional) Storage framework communication [ SANCOMM ]: SAN-based communication is an additional heartbeating option in version 7 clusters. If properly enabled, the storage framework communication will pass heartbeats between the Fibre Channel adapters within the shared SAN environment to provide an additional heartbeat communication path. This configuration is only supported over SAS, or 4 GB and 8 GB Fibre Channel adapters and will work in dedicated host bus adapters (HBAs) or virtualized ones using virtual Small Computer System Interface (VSCSI) or N-Port ID Virtualization (NPIV). On the supported HBAs, you must enable target mode on the LPARs that own the cards and ensure that the SAN zoning provides visibility between all applicable adapters on all cluster members.
chdev –l fscsi# -a dyntrk=yes –a fc_err_recov=fast_fail –P
chdev –l fcs# -a tme=yes –P (reboot is required)

Note: The –P is used to only update the AIX ODM when there are existing child devices on the HBAs,hence why a reboot is required for the setting to take effect.

Virtualized environments require the use of a reserved Ethernet VLAN (3358) between the client LPARs and the corresponding Virtual I/O Server (VIOS) instances. A virtual Ethernet adapter must be defined on the client LPAR and on the VIOS in order to create a bridge that allows SAN heartbeat communication to reach the physical HBAs on the VIOS instances. The virtual Ethernet adapters are not required to have an IP address defined on them. For the storage packets to pass between cluster members defined across physical server frames, the SAN zoning must include all corresponding HBA worldwide port numbers (WWPNs). In a virtualized environment, the physical WWPN for the HBAs in each VIOS (not the client virtual WWPNs) need to be defined within the same SAN zone. Review the current online documentation or recent Redbooks publications for examples using this feature.


CLI rapid deployment instructions

PowerHA SystemMirror V7 clusters can be created entirely from the new CLI. In this example, IPs have already been appended to the /etc/hosts file. The volume group is already imported onto all cluster members and the application scripts have already been written and propagated to the common /usr/local/hascripts directory in each of the cluster nodes. The following instructions create a basic two-node cluster:

Cluster topology configuration
NetworkLabelFunctionInterfaceNode
net_ether_01 Nodeb_base1 boot en0 nodeA
net_ether_01 Nodeb_base1 boot en0 nodeB
net_ether_01 sharedIP service alias shared
Resource group configuration
Resource group nameDB_app1_rg
Startup Policy Online on Home Node Only
Fallover Policy Fallover to Next Priority Node
Fallback Policy Never Fallback
Participating Nodes NodeA NodeB
Service IP Label sharedIP
Volume Group sharedvg
Application Controller DB_App1

Note: Resource Group policies in this example are set to the most commonly used policies of Online on Home Node Only (default in the command and its input is not required), Fallover to the Next Available Node and Never Fallback.

The following tasks with the different clmgr commands are required to create the cluster topology and resource group configuration outlined in the tables above:

  • Create a cluster.
    clmgr add cluster SampleCluster repository=hdisk10 nodes=nodea.dfw.ibm.com, nodeb.dfw.ibm.com
  • Add service IP.
    clmgr add service_ip sharedIP network=net_ether_01
  • Define application controller: clmgr add application_controller DB_app1 startscript="/usr/local/hascripts/DB_app_start.sh" stopscript="/usr/local/hascripts/DB_app_stop.sh"
  • Create resource group:
    clmgr add rg DB_app1_rg nodes=nodea.dfw.ibm.com, nodeb.dfw.ibm.com startup=ohn fallback=nfb service_label=sharedIP volume_group=sharedvg application=DB_app1
  • Verify and synchronize cluster:
    clmgr sync cluster

Note: The CAA private volume group created on the repository disk shows up only after the first time the cluster definitions are synchronized. This is a hands-off volume group and should not be modified, mirrored, or extended through AIX LVM. Also note that the syntax options in our example can be modified to include additional cluster features.


Common administrative tasks

This section outlines the different operations or commands that will effectively accomplish the same thing but may use the clmgr or the older legacy commands.

  • Access PowerHA SystemMirror SMIT menus:
    • smitty sysmirror
    • smitty cl_admin
  • Start cluster services: (different choices)
    • clmgr start cluster
    • clmgr online node nodeA
    • clmgr start node node A
    • smitty clstart
  • Stop cluster services: (different choices)
    • clmgr stop cluster
    • clmgr offline node nodeA
    • clmgr stop node nodeA
    • smitty clstop
  • Verify / Synchronize cluster:
    • clmgr verify cluster
    • clmgr sync cluster
  • Move resource group: (different choices)
    • clmgr move rg rgA, rgB node=nodeA (with multiple RGs move is performed serially)
    • clRGmove -g RGname -n nodeA -m
  • Add an application monitor:
    • clmgr add mon appA_mon TYPE=Custom APPLICATION=appA MONITORINTERVAL=60 FAILUREACTION=fallover STABILIZATION=300 RESTARTINTERVAL=1200 CLEANUPMETHOD=/usr/local/hascripts/appA_cleanup.sh RESTARTMETHOD=/usr/local/hascripts/appA_restart.sh RESTARTCOUNT=3 MONITORMETHOD=/usr/local/hascripts/appA_monitor.sh
  • Suspend / Resume application monitoring:
    • clmgr manage application_controller suspend test_app1
    • clmgr resume application_controller resume test_app1

Note: The clmgr operation will automatically mount the file system and update the ODM and /etc/filesystems file in the other cluster nodes. If the volume group is already defined to a resource group, the cluster will automatically manage the file system.

  • Validate IP multicast traffic: (must be run on each node)
    • mping –v –r –a 228.10.10.1 (nodeA – receive flag)
    • mping –v –s –a 228.10.10.1 (nodeB – send flag)
  • Display / Modify tunables:
    • clctrl – tune –L display default and set tunable values
Sample output:
root@mhoracle1 /> clctrl -tune -L
NAME                      DEF    MIN    MAX    UNIT           SCOPE
ENTITY_NAME(UUID)                                                CUR
config_timeout            240    0      2G-1   seconds        c n
  sapdemo71_cluster(1de50be8-6ab0-11e2-ace9-46a6ba546402)          240

deadman_mode              a                                   c n
  sapdemo71_cluster(1de50be8-6ab0-11e2-ace9-46a6ba546402)          a

hb_src_disk               1      -1     3                     c
  sapdemo71_cluster(1de50be8-6ab0-11e2-ace9-46a6ba546402)          1

hb_src_lan                1      -1     3                     c
  sapdemo71_cluster(1de50be8-6ab0-11e2-ace9-46a6ba546402)          1

hb_src_san                2      -1     3                     c
  sapdemo71_cluster(1de50be8-6ab0-11e2-ace9-46a6ba546402)          2

link_timeout              30000  0      1171K  milliseconds   c n
  sapdemo71_cluster(1de50be8-6ab0-11e2-ace9-46a6ba546402)          30000

node_down_delay           10000  5000   600000 milliseconds   c n
  sapdemo71_cluster(1de50be8-6ab0-11e2-ace9-46a6ba546402)          10000

node_timeout              20000  10000  600000 milliseconds   c n
  sapdemo71_cluster(1de50be8-6ab0-11e2-ace9-46a6ba546402)          20000

packet_ttl                32     1      64                    c n
  sapdemo71_cluster(1de50be8-6ab0-11e2-ace9-46a6ba546402)          32

remote_hb_factor          10     1      100                   c
  sapdemo71_cluster(1de50be8-6ab0-11e2-ace9-46a6ba546402)          10

repos_mode                e                                   c n
  sapdemo71_cluster(1de50be8-6ab0-11e2-ace9-46a6ba546402)          e

site_merge_policy         p                                   c
  sapdemo71_cluster(1de50be8-6ab0-11e2-ace9-46a6ba546402)          p

n/a means parameter not supported by the current platform or kernel

Scope codes:
c = clusterwide: applies to the entire cluster
s = per site: may be applied to one or more sites
n = per node: may be applied to one or more nodes
i = per interface: may be applied to one or more communication interfaces

Value conventions:
K = Kilo: 2^10       G = Giga: 2^30       P = Peta: 2^50
M = Mega: 2^20       T = Tera: 2^40       E = Exa: 2^60

Note: In AIX 61 TL9 and AIX 71 TL3, usability has been enhanced to allow users to specify the IP address of the interface they want to mping on [ –b address ]. Note that in previous versions, the commands might report back as successful as long as it could mping down one of the interfaces on the server.

  • CAA enhanced usability:
    The bos.cluster.rte CAA package introduced the clcmd command. It allows administrators to precede their commands and collection information from all cluster nodes from a single window.
    • clcmd netstat –in displays all interfaces and IPs from all cluster nodes
    • clcmd lspv displays all Physical Volume Identifiers (PVIDs) and VG information from all cluster nodes
Sample output:
root@mhoracle1 /> clcmd netstat –in
-------------------------------
NODE mhoracle2.dfw.ibm.com
-------------------------------
Name  Mtu   Network     Address           Ipkts Ierrs    Opkts Oerrs  Coll
en0   1500  link#2      32.43.2b.33.8a.2  3256281  0   267653     0     0
en0   1500  9.19.51     9.19.51.212       3256281  0   267653     0     0
lo0   16896 link#1                         378442  0   378442     0     0
lo0   16896 127         127.0.0.1          378442  0   378442     0     0
lo0   16896 ::1%1                          378442  0   378442     0     0

-------------------------------
NODE mhoracle1.dfw.ibm.com
-------------------------------
Name  Mtu   Network     Address           Ipkts Ierrs    Opkts Oerrs  Coll
en0   1500  link#2      46.a6.ba.54.64.2  3318895  0   251392     0     0
en0   1500  9.19.51     9.19.51.239       3318895  0   251392     0     0
en0   1500  9.19.51     9.19.51.211       3318895  0   251392     0     0
lo0 16896 link#1 283853 0 283853 0 0
lo0   16896 127         127.0.0.1          283853  0   283853     0     0
lo0   16896 ::1%1                          283853  0   283853     0     0
Sample output:
root@mhoracle1 /> clcmd lspv

-------------------------------
NODE mhoracle2.dfw.ibm.com
-------------------------------
hdisk0          00c23c9fedcf8f86              rootvg         active	
hdisk1          00f604142514be43              sapvg          concurrent
hdisk2          00f604142514beb0              oravg          concurrent
hdisk3          00f604142514bf1c              None
hdisk4          00f604142514bfb3              None
hdisk5          00f604142514c023              None
hdisk6          00f604142514c090              None
hdisk9          00f626d13aa3645a              caavg_private  active
hdisk7          00f604143a421dd3              sapersvg       concurrent
hdisk8          00f604143a4243c4              sapsgfvg       concurrent

-------------------------------
NODE mhoracle1.dfw.ibm.com
-------------------------------
hdisk0          00f60414ed2ecec2              rootvg         active
hdisk1          00f604142514be43              sapvg          concurrent
hdisk2          00f604142514beb0              oravg          concurrent
hdisk3          00f604142514bf1c              None
hdisk4          00f604142514bfb3              None
hdisk5          00f604142514c023              None
hdisk6          00f626d1ffcc98bb              scrap_backup_vg active
hdisk9          00f626d13aa3645a              caavg_private  active
hdisk7          00f604143a421dd3              sapersvg       concurrent
hdisk8          00f604143a4243c4              sapsgfvg       concurrent
  • Replace repository disk:
    • clmgr replace repository new_disk

Cluster status monitoring

This section outlines a number of commands to check the levels of code being used and the status of the corresponding cluster daemons and services. Sample outputs have been provided but you may want to experiment in your own environment to see which ones are the most useful to you.

  • Product version
  • halevel –s
  • lslpp -l cluster.es.server.rte
  • lssrc –ls clstrmgrES | grep fix
  • clmgr query version
Sample output:
root@mhoracle1 /> halevel -s
7.1.2 SP3

root@mhoracle1 /> lslpp -l cluster.es.server.rte
Fileset                      Level  State      Description

Path: /usr/lib/objrepos
cluster.es.server.rte      7.1.2.3  COMMITTED  Base Server Runtime

Path: /etc/objrepos
cluster.es.server.rte      7.1.2.3  COMMITTED  Base Server Runtime


root@mhoracle1 /> lssrc -ls clstrmgrES | grep fix
cluster fix level is "3"

root@mhoracle1 /> clmgr query version

SystemMirror Information:
=========================
Version:            7.1.2 SP3
Build Level:        1323C_hacmp712 (Jul 12 2013, 14:21:00)
Cluster Type:       Multi Site Cluster Deployment (Stretched Cluster)

CAA Information:
================
Oct 30 2012
14:30:59
h2012_44A1
@(#) _kdb_buildinfo unix_64 Oct 30 2012 14:30:59 h2012_44A1

Cluster Configured:  Yes.

Host Information:
=================
HOSTNAME:       mhoracle1.dfw.ibm.com
IPADDRESS:      9.19.51.211
LOCALHOST:      true
HAVERSION:      7.1.2.3
VERSION_NUMBER: 14
HAEDITION:      STANDARD
AIX_LEVEL:      7100-02-01-1245

Director Information:
=====================
DIRECTOR_AGENT_STATUS:            ACTIVE
DIRECTOR_AGENT_PLUGIN_STATUS:     ACTIVE
DIRECTOR_AGENT_PLUGIN_VERSION:    7.1.2.0
DIRECTOR_AGENT_PLUGIN_INST_DATE:  Tue Jan 29 13:39:55 CST6CDT 2013
DIRECTOR_AGENT_PLUGIN_BUILD_DATE: Monday October 08, 2012 at 10:09:01 
DIRECTOR_AGENT_FILE_SYSTEM:       96%
DIRECTOR_AGENT_TRACE_LEVEL:       NORMAL
DIRECTOR_AGENT_MANAGER:
DIRECTOR_AGENT_EVENT_STATUS:      ERROR
  • Query cluster settings / status:
  • clmgr query cluster
  • clmgr –v –a name,state,raw_state query node
  • lssrc –ls clstrmgrES | grep state
  • clshowsrv –v
Sample output:
root@mhoracle1 /> clmgr query cluster
CLUSTER_NAME="sapdemo71_cluster"
CLUSTER_ID="1120652512"
STATE="STABLE"
TYPE="SC"
VERSION="7.1.2.3"
VERSION_NUMBER="14"
EDITION="STANDARD"
CLUSTER_IP="228.19.51.211"
UNSYNCED_CHANGES="false"
SECURITY="Standard"
FC_SYNC_INTERVAL="10"
RG_SETTLING_TIME="0"
RG_DIST_POLICY="node"
MAX_EVENT_TIME="180"
MAX_RG_PROCESSING_TIME="180"
DAILY_VERIFICATION="Enabled"
VERIFICATION_NODE="Default"
VERIFICATION_HOUR="0"
VERIFICATION_DEBUGGING="Enabled"
LEVEL="DISABLED"
ALGORITHM=""
GRACE_PERIOD_SEC=""
REFRESH=""
MECHANISM=""
CERTIFICATE=""
PRIVATE_KEY=""
HEARTBEAT_FREQUENCY="20"
GRACE_PERIOD="10"
SITE_POLICY_FAILURE_ACTION="fallover"
SITE_POLICY_NOTIFY_METHOD=""
SITE_HEARTBEAT_CYCLE="0"
SITE_GRACE_PERIOD="0"

root@mhoracle1 /> clmgr -v -a name,state,raw_state query node
NAME="mhoracle1"
STATE="NORMAL"
RAW_STATE="ST_STABLE"

NAME="mhoracle2"
STATE="NORMAL"
RAW_STATE="ST_STABLE"

root@mhoracle1 /> lssrc -ls clstrmgrES | grep state
Current state: ST_STABLE

root@mhoracle1 /> clshowsrv -v
Status of the RSCT subsystems used by HACMP:
Subsystem        Group            PID          Status
cthags           cthags           5243090      active
ctrmc            rsct             5439656      active

Status of the HACMP subsystems:
Subsystem        Group            PID          Status
clstrmgrES       cluster          5505208      active
clcomd           caa              7405578      active

Status of the optional HACMP subsystems:
Subsystem         Group            PID          Status
clinfoES         cluster                       inoperative
  • Display cluster configuration:
  • cltopinfo
  • clmgr(View report basic)
  • cllsif (Cluster Topology view)
  • clshowres(Resource Group configuration view)
Sample output:
root@mhoracle1 /> cltopinfo
Cluster Name: sapdemo71_cluster
Cluster Connection Authentication Mode: Standard
Cluster Message Authentication Mode: None
Cluster Message Encryption: None
Use Persistent Labels for Communication: No
Repository Disk: hdisk9
Cluster IP Address: 228.19.51.211
There are 2 node(s) and 1 network(s) defined
NODE mhoracle1:
     Network net_ether_01
          sharesvc1       9.19.51.239
          mhoracle1       9.19.51.211
NODE mhoracle2:
     Network net_ether_01
          sharesvc1       9.19.51.239
          mhoracle2       9.19.51.212

Resource Group SAP_rg
     Startup Policy   Online On Home Node Only
     Fallover Policy  Fallover To Next Priority Node In The List
      Fallback Policy  Never Fallback
     Participating Nodes      mhoracle1 mhoracle2
      Service IP Label                 sharesvc1


root@mhoracle1 /> clmgr view report basic
Cluster Name: sapdemo71_cluster
Cluster Connection Authentication Mode: Standard
Cluster Message Authentication Mode: None
Cluster Message Encryption: None
Use Persistent Labels for Communication: No
Repository Disk: hdisk9
Cluster IP Address: 228.19.51.211
There are 2 node(s) and 1 network(s) defined
NODE mhoracle1:
     Network net_ether_01
          sharesvc1       9.19.51.239
          mhoracle1       9.19.51.211
NODE mhoracle2:
     Network net_ether_01
          sharesvc1       9.19.51.239
          mhoracle2       9.19.51.212

Resource Group SAP_rg
     Startup Policy   Online On Home Node Only
     Fallover Policy  Fallover To Next Priority Node In The List
      Fallback Policy  Never Fallback
     Participating Nodes      mhoracle1 mhoracle2
      Service IP Label         sharesvc1

root@mhoracle1 /> cllsif
Adapter              Type       Network    Net Type   Attribute  Node       
IP Address       Hardware Address Interface Name   Global Name      
Netmask          Alias for HB Prefix Length
mhoracle1        boot       net_ether_01 ether      public     
mhoracle1  9.19.51.211      en0   255.255.255.0              24
sharesvc1        service    net_ether_01 ether      public     
mhoracle1  9.19.51.239             255.255.255.0             24
mhoracle2        boot       net_ether_01 ether      public     
mhoracle2  9.19.51.212      en0    255.255.255.0             24
sharesvc1        service    net_ether_01 ether      public     
mhoracle2  9.19.51.239             255.255.255.0               24


root@mhoracle1 /> clshowres
Resource Group Name                                 		SAP_rg
Participating Node Name(s) 		mhoracle1 mhoracle2
Startup Policy                       Online On Home Node Only
Fallover Policy    			Fallover To Next Priority Node In The List
Fallback Policy    			Never Fallback
Site Relationship              	ignore
Node Priority
Service IP Label    			sharesvc1
Filesystems   				ALL
Filesystems Consistency Check         fsck
Filesystems Recovery Method           parallel
Filesystems/Directories to be exported (NFSv3)	/asap /sapmnt/TST /usr/sap/trans
Filesystems/Directories to be exported (NFSv4)
Filesystems to be NFS mounted
Network For NFS Mount
Filesystem/Directory for NFSv4 Stable Storage
Volume Groups                         sapvg oravg sapersvg sapsgfvg
Concurrent Volume Groups
Use forced varyon for volume groups, if necessary false
Disks
Raw Disks
Disk Error Management?                         no
GMVG Replicated Resources
GMD Replicated Resources
PPRC Replicated Resources
SVC PPRC Replicated Resources
EMC SRDF® Replicated Resources
Hitachi TrueCopy® Replicated Resources
Generic XD Replicated Resources
AIX Connections Services
AIX Fast Connect Services
Shared Tape Resources
Application Servers                          sap
Highly Available Communication Links
Primary Workload Manager Class
Secondary Workload Manager Class
Delayed Fallback Timer
Miscellaneous Data
Automatically Import Volume Groups           false
Inactive Takeover
SSA Disk Fencing         		       false
Filesystems mounted before IP configured     true
WPAR Name

Run Time Parameters:

Node Name                                     mhoracle1
Debug Level                                   high
Format for hacmp.out                          Standard

Node Name                                     mhoracle2
Debug Level  		                       high
Format for hacmp.out                          Standard
  • Location of resources:
    • clRGinfo –p
Sample output:
root@mhoracle1 /> clRGinfo -p

Cluster Name: sapdemo71_cluster
Resource Group Name: SAP_rg
Node                         Group State
---------------------------- ---------------
mhoracle1                    ONLINE
mhoracle2                    OFFLINE
  • CAA commands:
    • lscluster –c (Cluster configuration, multicast address)
    • lscluster –i (Status of cluster interfaces)
    • lscluster –d (Cluster storage interfaces)
    • lcluster –m (Cluster node configuration information)
Sample output:
root@mhoracle1 /> lscluster -c
Cluster Name: sapdemo71_cluster
Cluster UUID: 1de50be8-6ab0-11e2-ace9-46a6ba546402
Number of nodes in cluster = 2
        Cluster ID for node mhoracle1.dfw.ibm.com: 1
        Primary IP address for node mhoracle1.dfw.ibm.com: 9.19.51.211
        Cluster ID for node mhoracle2.dfw.ibm.com: 2
        Primary IP address for node mhoracle2.dfw.ibm.com: 9.19.51.212
Number of disks in cluster = 1
        Disk = hdisk9 UUID = d3ce4fd5-3003-ac21-9789-6d9a590242fd 
cluster_major = 0 cluster_minor = 1
Multicast for site LOCAL: IPv4 228.19.51.211 IPv6 ff05::e413:33d3


root@mhoracle1 /> lscluster -i
Network/Storage Interface Query

Cluster Name: sapdemo71_cluster
Cluster UUID: 1de50be8-6ab0-11e2-ace9-46a6ba546402
Number of nodes reporting = 2
Number of nodes stale = 0
Number of nodes expected = 2

Node mhoracle1.dfw.ibm.com
Node UUID = 1dfc2d5a-6ab0-11e2-ace9-46a6ba546402
Number of interfaces discovered = 3
        Interface number 1, en0
                IFNET type = 6 (IFT_ETHER)
                NDD type = 7 (NDD_ISO88023)
                MAC address length = 6
                MAC address = 46:A6:BA:54:64:02
                Smoothed RTT across interface = 7
                Mean deviation in network RTT across interface = 3
                Probe interval for interface = 100 ms
                IFNET flags for interface = 0x1E080863
                NDD flags for interface = 0x0021081B
                Interface state = UP
                Number of regular addresses configured on interface = 2
                IPv4 ADDRESS: 9.19.51.211 broadcast 9.19.51.255 netmask 255.255.255.0
                IPv4 ADDRESS: 9.19.51.239 broadcast 9.19.51.255 netmask 255.255.255.0
                Number of cluster multicast addresses configured on interface = 1
                IPv4 MULTICAST ADDRESS: 228.19.51.211
        Interface number 2, sfwcom
                IFNET type = 0 (none)
                NDD type = 304 (NDD_SANCOMM)
                Smoothed RTT across interface = 7
                Mean deviation in network RTT across interface = 3
                Probe interval for interface = 100 ms
                IFNET flags for interface = 0x00000000
                NDD flags for interface = 0x00000009
                Interface state = UP
        Interface number 3, dpcom
                IFNET type = 0 (none)
                NDD type = 305 (NDD_PINGCOMM)
                Smoothed RTT across interface = 750
                Mean deviation in network RTT across interface = 1500
                Probe interval for interface = 22500 ms
                IFNET flags for interface = 0x00000000
                NDD flags for interface = 0x00000009
                Interface state = UP RESTRICTED AIX_CONTROLLED

Node mhoracle2.dfw.ibm.com
Node UUID = 1e1476a8-6ab0-11e2-ace9-46a6ba546402
Number of interfaces discovered = 3
        Interface number 1, en0
                IFNET type = 6 (IFT_ETHER)
                NDD type = 7 (NDD_ISO88023)
                MAC address length = 6
                MAC address = 32:43:2B:33:8A:02
                Smoothed RTT across interface = 7
                Mean deviation in network RTT across interface = 3
                Probe interval for interface = 100 ms
                IFNET flags for interface = 0x1E080863
                NDD flags for interface = 0x0021081B
                Interface state = UP
                Number of regular addresses configured on interface = 1
                IPv4 ADDRESS: 9.19.51.212 broadcast 9.19.51.255 netmask 255.255.255.0
                Number of cluster multicast addresses configured on interface = 1
                IPv4 MULTICAST ADDRESS: 228.19.51.211
        Interface number 2, sfwcom
                IFNET type = 0 (none)
                NDD type = 304 (NDD_SANCOMM)
                Smoothed RTT across interface = 7
                Mean deviation in network RTT across interface = 3
                Probe interval for interface = 100 ms
                IFNET flags for interface = 0x00000000
                NDD flags for interface = 0x00000009
                Interface state = UP
        Interface number 3, dpcom
                IFNET type = 0 (none)
                NDD type = 305 (NDD_PINGCOMM)
                Smoothed RTT across interface = 750
                Mean deviation in network RTT across interface = 1500
                Probe interval for interface = 22500 ms
                IFNET flags for interface = 0x00000000
                NDD flags for interface = 0x00000009
                Interface state = UP RESTRICTED AIX_CONTROLLED

		root@mhoracle1 /> lscluster -d
		Storage Interface Query

		Cluster Name: sapdemo71_cluster
		Cluster UUID: 1de50be8-6ab0-11e2-ace9-46a6ba546402
		Number of nodes reporting = 2
		Number of nodes expected = 2

		Node mhoracle1.dfw.ibm.com
		Node UUID = 1dfc2d5a-6ab0-11e2-ace9-46a6ba546402
		Number of disks discovered = 1
				hdisk9:
						State : UP
						 uDid : 
3E213600A0B80001132D0000020024D3850960F1815      FAStT03IBMfcp
						 uUid : d3ce4fd5-3003-
ac21-9789-6d9a590242fd
				Site uUid : 51735173-5173-5173-5173-
517351735173
						 Type : REPDISK

		Node mhoracle2.dfw.ibm.com
		Node UUID = 1e1476a8-6ab0-11e2-ace9-46a6ba546402
		Number of disks discovered = 1
				hdisk9:
						State : UP
						 uDid : 
3E213600A0B80001132D0000020024D3850960F1815      FAStT03IBMfcp
						 uUid : d3ce4fd5-3003-
ac21-9789-6d9a590242fd
				Site uUid : 51735173-5173-5173-5173-
517351735173
					   Type : REPDISK

root@mhoracle1 /> lscluster -m
		Calling node query for all nodes...
		Node query number of nodes examined: 2

			Node name: mhoracle1.dfw.ibm.com
			Cluster shorthand id for node: 1
			UUID for node: 1dfc2d5a-6ab0-11e2-ace9-46a6ba546402
			State of node: UP  NODE_LOCAL
			Smoothed rtt to node: 0
			Mean Deviation in network rtt to node: 0
			Number of clusters node is a member in: 1
			CLUSTER NAME       SHID         UUID
			sapdemo71_cluster  0            1de50be8-6ab0-
11e2-ace9-46a6ba546402
			SITE NAME          SHID         UUID
			LOCAL              1            51735173-5173-5173-5173-
517351735173

			Points of contact for node: 0

----------------------------------------------------------------------------

			Node name: mhoracle2.dfw.ibm.com
			Cluster shorthand id for node: 2
			UUID for node: 1e1476a8-6ab0-11e2-ace9-46a6ba546402
			State of node: UP
			Smoothed rtt to node: 7
			Mean Deviation in network rtt to node: 3
			Number of clusters node is a member in: 1
			CLUSTER NAME       SHID         UUID
			sapdemo71_cluster  0            1de50be8-6ab0-
11e2-ace9-46a6ba546402
			SITE NAME          SHID         UUID
			LOCAL              1            51735173-5173-
5173-5173-517351735173

			Points of contact for node: 3
			------------------------------------------
			Interface     State  Protocol  Status
                       ------------------------------------------
			dpcom       DOWN none      RESTRICTED
			en0           UP     	IPv4       none
			sfwcom     UP     	none      none
SANCOMM is working only if sfwcom is visible in the lscluster –m output:
Interface State Protocol Status
		
dpcom DOWN none RESTRICTED
en0 UP IPv4 none
sfwcom UP none none
lscluster –s cluster statistics
You can also check the sent and received storage packet counts in

lscluster -s:
storage pkts sent: 168493709 storage pkts recv: 82575360
		
# clras sancomm_status
NAME                        UUID                          STATUS
nodeA.dfw.ibm.com | e9b4d6a4-5e71-11-e2-af42-00145ee726e1 | UP |
lscluster –s full sample output
root@mhoracle1 /> lscluster -s
Cluster Network Statistics:

pkts seen: 15627136                   		passed: 3335048
IP pkts: 12873577                     		UDP pkts: 12344880
gossip pkts sent: 2470583             		gossip pkts recv: 4932115
cluster address pkts: 0               		CP pkts: 12292272
bad transmits: 0                      		bad posts: 33
Bad transmit (overflow): 0
Bad transmit (host unreachable): 0
Bad transmit (net unreachable): 0
Bad transmit (network down): 0
Bad transmit (no connection): 0
short pkts: 0                         		multicast pkts: 11664024
cluster wide errors: 0                		bad pkts: 0
dup pkts: 398159                      		pkt fragments: 10964
fragments queued: 0                   		fragments freed: 0
pkts pulled: 0                        		no memory: 0
rxmit requests recv: 619              		requests found: 511
requests missed: 157                  		ooo pkts: 76
requests reset sent: 157              		reset recv: 90
remote tcpsock send: 0                		tcpsock recv: 0
rxmit requests sent: 696
alive pkts sent: 0                    		alive pkts recv: 0
ahafs pkts sent: 14                   		ahafs pkts recv: 4
nodedown pkts sent: 0                 		nodedown pkts recv: 0
socket pkts sent: 24859               		socket pkts recv: 24910
cwide pkts sent: 990856               		cwide pkts recv: 992280
socket pkts no space: 0               		pkts recv notforhere: 0
Pseudo socket pkts sent: 0            	       Pseudo socket pkts recv: 0
Pseudo socket pkts dropped: 0
arp pkts sent: 3                      		arp pkts recv: 1
stale pkts recv: 0                    		other cluster pkts: 2
storage pkts sent: 6022728            	       storage pkts recv: 5825646
disk pkts sent: 7023                  		disk pkts recv: 7508
unicast pkts sent: 435987             		unicast pkts recv: 680571
out-of-range pkts recv: 0
IPv6 pkts sent: 0                     		IPv6 pkts recv: 0
IPv6 frags sent: 0                    		IPv6 frags recv: 0
Unhandled large pkts: 0

Sample configuration files

/etc/cluster/rhosts
9.10.10.1
9.10.10.2
/etc/hosts
127.0.0.1 loopback
# PowerHA SystemMirror Cluster IP Addresses
9.10.10.1 nodea.dfw.ibm.com nodeA # node A base address
9.10.10.2 nodeb.dfw.ibm.com nodeB # node B base address
9.10.10.10 shared_ip.dfw.ibm.com shared_ip # Shared SVC IP address
/etc/netsvc.conf
hosts=local,bind
/etc/resolv.conf
nameserver	9.0.1.1
domain		dfw.ibm.com
/usr/es/sbin/cluster/netmon.cf
9.10.10.6
!REQD owner target
!IBQPORT owner
!IBQPORTONLY owner
			

			 Reference /usr/sbin/rsct/samples/hats/netmon.cf

			 Documentation APARs: IZ01332 IZ01332

Application controller scripts

/usr/local/hascripts/appA_start.sh (basic SAP example)
#!/bin/ksh
su – orastst –c "lsnrctl start"
su – tstadm –c "startsap"
exit 0
/usr/local/hascripts/appA_stop.sh (basic SAP example)
#!/bin/ksh
su – tstadm –c "stopsap"
su – oratst –c "lsnrctl stop"
exit 0
/usr/local/hascripts/appA_monitor.sh
#/bin/ksh
…user provided logic ….
exit 0

Useful cluster log files

/var/hacmp/log/hacmp.out (detailed event processing)
Aug 14 16:34:49 EVENT START: node_up nodea
:node_up [165] [[ high==high ]]
:node_up [165] version=1.10.11.32
:node_up [167] node_up_vg_fence_init
……
......
/var/hacmp/adm/cluster.log (high-level cluster events)
Aug 14 16:34:49 nodea user:notice PowerHA SystemMirror for AIX: EVENT START: node_up nodea
Aug 14 16:34:51 nodea user:notice PowerHA SystemMirror for AIX: EVENT COMPLETED:
node_up nodea
…...
…..
/var/hacmp/log/clutils.log (generated by cluster utilities)
CLMGR STARTED 	(9153:10254698:5177392) : Thu Aug 14 16:34:49 CET 2013
CLMGR USER 	(9153:10254698:5177392) : ::root:system
CLMGR COMMAND 	(9153:10254698:5177392) : clmgr online node nodea
CLMGR ACTUAL 	(9153:10254698:5177392) : start_node nodea
/var/adm/ras/syslog.caa (CAA logging and troubleshooting)
Aug 14 16:34:28 nodea caa:info syslog: caa_query.c cl_get_capability 2594
There are 2 more capabilities defined at level 131072

Aug 14 16:34:49 nodea caa:info syslog: caa_query.c cl_get_capability 2594
There are 2 more capabilities defined at level 131072
  • Also useful to check:
    • /var/hacmp/clverify/clverify.log (for detailed verification check output)
    • /var/hacmp/clcomd/clcomd.log (for troubleshooting communication issues)
    • /var/hacmp/log/cspoc.log.long (for detailed information from CSPOC
    • /var/hacmp/log/clstrmgr.debug (generated by clstrmgr daemon)
    • /var/hacmp/log/autoverify.log (generated by nightly verification)

Useful references

Comments

developerWorks: Sign in

Required fields are indicated with an asterisk (*).


Need an IBM ID?
Forgot your IBM ID?


Forgot your password?
Change your password

By clicking Submit, you agree to the developerWorks terms of use.

 


The first time you sign into developerWorks, a profile is created for you. Information in your profile (your name, country/region, and company name) is displayed to the public and will accompany any content you post, unless you opt to hide your company name. You may update your IBM account at any time.

All information submitted is secure.

Choose your display name



The first time you sign in to developerWorks, a profile is created for you, so you need to choose a display name. Your display name accompanies the content you post on developerWorks.

Please choose a display name between 3-31 characters. Your display name must be unique in the developerWorks community and should not be your email address for privacy reasons.

Required fields are indicated with an asterisk (*).

(Must be between 3 – 31 characters.)

By clicking Submit, you agree to the developerWorks terms of use.

 


All information submitted is secure.

Dig deeper into AIX and Unix on developerWorks


static.content.url=http://www.ibm.com/developerworks/js/artrating/
SITE_ID=1
Zone=AIX and UNIX
ArticleID=948875
ArticleTitle=IBM PowerHA SystemMirror rapid deploy cluster worksheets for IBM AIX
publish-date=10232013