PowerHA 7.1 SAN Heartbeat
talor 27000411MV Comments (10) Visits (21390)
I have found reading blogs really useful on many occasions, to find out about new features, and more importantly how to get them working. In this case I’ll put one up about how to get SAN heartbeat working on PowerHA 7.1. The redbook PowerHA SystemMirror 7.1 for AIX sg247845 also has good information on this topic.
One thing I setup recently was a two node PowerHA 7.1.1 cluster using SAN heartbeat. PowerHA 7.1 no longer supports a disk heartbeat, in it’s place SAN heartbeats can be used.
The first thing that needs to be done is, to configure the SAN zoning correctly. As well as both nodes in the cluster needing to be zoned to the shared storage, the HBAs belonging to both nodes need to be zoned together. The easiest way to do this is just to have one zone per fabric for the node HBA to the storage controller, and another zone which contains the FC adapters of both nodes.
Assuming that there are two SAN fabrics, you would have a heartbeat zone on each SAN fabric.
Once that’s completed, you now need to turn on “tme” on each of the FC adapters. The below is how I did this, as well as turning on dynamic tracking, setting FC error recovery to fast fail, as well as setting queue depth on the HBA card. This is target mode enabled, which allows the HBA to act as both a target and an initiator.
# for i in fcs0 fcs1; do
> rmdev -l $i –R
> chdev -l $i -a tme=yes
> chdev -l $i -a dyntrk=yes -a fc_e
> chdev -l $i -a num_cmd_elems=2048 -a max_
> cfgmgr -l $i
Next verify that target mode is enabled, by checking # lsattr –El fcsX (X being the adapter you are checking).
Once that’s done configure the cluster, resources, resource groups etc and synchronize the cluster.
To verify the heartbeat is up, check # lscluster –i sfwcom and this will show the status of the SAN Fabric communications device. If the heartbeat is not working, sfwcom device will not appear at all, or it will show up as stale meaning that there is an issue with the zoning, one of the nodes is not up, or target mode is not enabled on both sides. The term sfwcom stands for Storage Framework Communication, which is how cluster aware AIX (CAA) is able to do a SAN heartbeat.
The output should look like this:
Interface number 3 sfwcom
ifnet type = 0 ndd type = 304
Mac address length = 0
Mac address = 0.0.0.0.0.0
Smoothed rrt across interface = 7
Mean Deviation in network rrt across interface = 3
Probe interval for interface = 100 ms
ifnet flags for interface = 0x0
ndd flags for interface = 0x9
Interface state UP
You can now see that this is working… sweet.
The challenge now is when you have nodes not using physical I/O, but you are using Virtual I/O server and your nodes are LPARs.
In my case, my nodes were LPARs on a Power7 machine, and the shared storage was presented using Virtual FC (NPIV).
On a virtual FC adapter, the tme attribute is not there, because the adapter is virtual.
What needs to be done is that the physical FC adapters allocated to the VIO servers need to have target mode enabled. In my case I had dual VIO servers, so I just turned on target mode, then rebooted each VIO server one at a time.
$ for i in fcs0 fcs1; do
>chdev -dev $i -attr tme=yes -perm
>chdev -dev $i -attr dyntrk=yes -attr fc_e
>chdev -dev $i -attr num_cmd_elems=2048 -attr max_
$ shutdown -restart
Once that’s completed, the final step is to setup the communication channel between the Virtual I/O server, and the LPARs which are going to use the heartbeat.
In my case I was using Virtual Ethernet, so I just added a new VLAN to my virtual network.
This involved firstly adding VLAN 3358 to my shared Ethernet adapter on my VIO servers, and adding an additional virtual adapter with PVID 3358 on each of the virtual clients. Do not put an IP address or anything on the extra Virtual Adapter on VLAN 3358. PowerHA will use this interface, nothing needs to be done.
Note that the ONLY VLAN that will work for this is 3358.
The below diagram shows the way this can be setup.
Once that’s all done, build the cluster, and running lscluster –i will show you that the SAN heartbeat is up.
Also check # lsdev –C |grep sfwcom and you will see the SAN based heartbeat communication devices coming available, however lscluster –i is the best way to confirm it’s working.
Hope someone finds this useful.