Story
During a consolidation project, customer asked me: could you migrate three email servers from Linux to AIX and make the solution HA/DR? Definitely yes, challenge accepted.
The former implementation of an email servers infrastructure was chrooted qmail on x86 Linux, but the solution was lacking some essential enterprise attributes like high availability, manageability etc. It has been built up somehow and it worked somehow. To migrate the status quo with no pain, the key decision was to keep the Qmail. The preferred target OS was AIX 7.1 and WPAR technology was naturally chosen to virtualize and isolate email environments, like chroot did. A simple proof of concept with Qmail running in a WPAR showed viability of this approach, so let's go and make it high available in a crosssite environment. Here it comes.
I assume, that you know what the WPAR is and you did play with it around. Also you have installed and configured PowerHA SystemMirror for AIX Enterprise Edition (man!) with DS8000, at least once. For the rest of this blog post I will refer to this as PowerHA/XD, which is something between PowerHA very_long_appendix and the beautiful obsolete HACMP/XD
Let's have a look at WPAR first. Just for recap, there are Application WPARs (not interesting here), System WPARs (shared and non-shared) and rootvg WPARs. Rootvg WPAR is a non-shared system WPAR with a storage dedicated to the WPAR, but shared between PowerHA/XD nodes; either SAN or NFS based. Indeed, you need rootvg WPAR in PowerHA/XD. If you like yourself, don't go the NFS way in PowerHA/XD. I can elaborate on how not to do things, but let's focus here, how to make it work.
There are two ways how to implement PowerHA/XD with WPAR.
- Running a resource group (RG) in an AIX WPAR
Rootvg WPAR in PowerHA/XD is currently not working. The thing is, that in PowerHA/XD you define PPRC resources in a RG. The PPRC resource definition are volume identifiers of hdisks. Thus hdisks are part of the RG definition. If the rootvg WPAR activates, the disks that the WPARs has its filesystems on, are removed from the global AIX. This makes PowerHA/XD management virtually impossible. It is a deal breaker.
- Traditional application server
The other approach is to run startwpar
and stopwpar
scripts and treat WPAR like an application. If you think about using suspend and resume, I haven't tried it, customer didn't have WPAR Manager. It is definitely worth of try, but in the first place check with network guys, if a cross site network infrastructure will be able to handle it (e.g. MAC and IP addresses reappear on a different site). Also you'd have to develop more complicated start/stop scripts.
WPAR configuration
Create a WPAR on two disks. Make a VG and LVs for the WPAR first
mailnodea:/root# WPAR=email01
mailnodea:/root# mkvg -n -V 51 -s 128 -y ${WPAR}vg hdisk2 hdisk3
mailnodea:/root#
mklv -t jfs2 -y lv${WPAR}slash -ex ${WPAR}vg 2
mklv -t jfs2 -y lv${WPAR}home -ex ${WPAR}vg 2
mklv -t jfs2 -y lv${WPAR}tmp -ex ${WPAR}vg 8
mklv -t jfs2 -y lv${WPAR}opt -ex ${WPAR}vg 8
mklv -t jfs2 -y lv${WPAR}usr -ex ${WPAR}vg 20
mklv -t jfs2 -y lv${WPAR}var -ex ${WPAR}vg 2
mklv -t jfs2 -y lv${WPAR}dummy -ex ${WPAR}vg 2
And create a wpar right away
mailnodea:/root#
mkwpar -l -F -r -o /tmp/${WPAR}.out -R active=yes \
-M directory=/ vfs=jfs2 dev=/dev/lv${WPAR}slash logname=INLINE \
-M directory=/home vfs=jfs2 dev=/dev/lv${WPAR}home logname=INLINE \
-M directory=/tmp vfs=jfs2 dev=/dev/lv${WPAR}tmp logname=INLINE \
-M directory=/opt vfs=jfs2 dev=/dev/lv${WPAR}opt logname=INLINE \
-M directory=/usr vfs=jfs2 dev=/dev/lv${WPAR}usr logname=INLINE \
-M directory=/var vfs=jfs2 dev=/dev/lv${WPAR}var logname=INLINE \
-N interface=en8 address=$MGMTADDRESS netmask=255.255.254.0 \
-N interface=en10 address=$PRODUCTIONADDRESS netmask=255.255.255.0 \
-h ${WPAR} -n ${WPAR}
mailnodea:/root# crfs -v jfs2 -d /dev/lv${WPAR}dummy -a logname=INLINE -m /wpars/${WPAR}dummyfs
The dummy filesystem is very important. It is a filesystem, that is not in the WPAR definition, but it does exist in the same VG. This filesystem will be configured in PowerHA/XD RG to be mounted, so the VG with the WPAR will be varied on.
Start the WPAR, install your application, perform all your customizations. Stop the WPAR, start the WPAR, check that everything works, check WPARs network connectivity. If the WPAR does what it should, stop it now.
PowerHA/XD configuration
Create a dual node crosssite PowerHA/XD configuration with Metro Mirror and with an RG having hdisk2 and hdisk3 as pprc sources. Among many other things:
- establish a pprc of hdisk2 and hdisk3 to the other site, perform importvg there with the same major number
- Define a RG will three attributes: application name, a volume group
${WPAR}vg
and a filesystem /wpars/${WPAR}dummyfs
) to mount.
- Put exit 0 into application start and stop scripts for now. An application monitor will be defined later too.
- Test the cluster. Move the RG between sites and check that the VG is varied on.
If this is too much abstract for you, leave a comment, I can make a PowerHA/XD blog one day.
Blending it all together
Import the WPAR on the other node
The tricky part is to import the WPAR on the other node. Importing a shared WPAR works as expected, but with a detached rootvg WPAR on AIX 7.1 I've been getting all kind of errors, mostly about already existing filesystems.
mailnodea:/root# scp -r /etc/corrals mailnodeb:/etc/
mailnodeb:/root# ln -sf /etc/corrals /etc/wpars
mailnodeb:/root# echo "mkssys -s cor_${WPAR} -u root -p /usr/lib/wpars/runwpar -a ${WPAR} -o /var/adm/ras/wpar.${WPAR}.srclog -e /var/adm/ras/wpar.${WPAR}.srclog -R -S -n 30 -f 31"
Now move your RG there and back and try to start the WPAR manually. Does your application work on both sites? Test, test, test.
Application start, stop and monitoring scripts
My solution has to be generic for any number of qmail WPARs, thus I developed one script, that does its job depending on the name it is called. You can either copy the script to a desired name or make a link. Scheme for the script name is WPAR_ACTION.sh
From network guys there was also a requirement, that the production IP must be different on both sites. The full version of the script can be downloaded here. The most important part is:
#!/bin/ksh -x
MYNAME=$(basename $0)
SERVICE_INTERFACE=en10
SERVICE_NETMASK=255.255.255.0
eval $(echo $(basename $0) | awk -F"[_\.]" '{print "WPAR="$1";ACTION="$2}')
lswpar ${WPAR} >/dev/null 2>&1
if [ $? -ne 0 ]; then
echo WPAR ${WPAR} is not defined here. Exiting.
exit 1
fi
case ${ACTION:-NONE} in
start)
startwpar -v ${WPAR}
;;
stop)
stopwpar -F -N ${WPAR}
;;
mon)
[ "$(lswpar -c -q ${WPAR} | cut -d : -f2)" == "A" ] || exit 1
[ $(ps -ef -@ ${WPAR} | grep -c qmail) -eq 0 ] && exit 1
;;
*)
echo "Undefined operation ${ACTION}"
exit 1
;;
esac
exit 0
The script is simple startwpar
and stopwpar
. To make all this work, my PowerHA/XD application setup is:
mailnodea:/root# cllsserv
as_email01 /usr/es/sbin/cluster/scripts/email01_start.sh /usr/es/sbin/cluster/scripts/email01_stop.sh
as_email02 /usr/es/sbin/cluster/scripts/email02_start.sh /usr/es/sbin/cluster/scripts/email02_stop.sh
mailnodea:/root# cllsappmon
email01_mon user
email02_mon user
mailnodea:/root# cllsappmon -c email01_mon
email01_mon:user:/usr/es/sbin/cluster/scripts/email01_mon.sh:60:longrunning:9:60:fallover:3:369:/usr/es/sbin/cluster/scripts/email01_start.sh::/usr/es/sbin/cluster/scripts/email01_stop.sh::::as_email01::
mailnodea:/usr/es/sbin/cluster/scripts# ln -s phaxd_wpars.sh email01_start.sh
mailnodea:/usr/es/sbin/cluster/scripts# ln -s phaxd_wpars.sh email01_stop.sh
mailnodea:/usr/es/sbin/cluster/scripts# ln -s phaxd_wpars.sh email01_mon.sh
mailnodea:/usr/es/sbin/cluster/scripts# ls -l email01*
drwxr-xr-x 60 root system 4096 May 18 15:01 email01_start.sh -> phaxd_wpars.sh
drwxr-xr-x 60 root system 4096 May 18 15:01 email01_stop.sh -> phaxd_wpars.sh
drwxr-xr-x 60 root system 4096 May 18 15:01 email01_mon.sh -> phaxd_wpars.sh
Notes
Running a rootvg WPAR in a PowerHA/XD cluster means, that all nodes in the cluster and all wpars in the cluster must be on the same AIX TL/SP. Keep in mind, that the wpar has to run in the global aix with the same TL/SP, when planning the rolling update.
Tags: 
wpar
powerha