Replicate an AIX rootvg for DR purposes
talor 27000411MV Visits (10013)
One question I have been asked latley is how to replicate an AIX system's rootvg to another AIX system using storage system replication. Can I just boot from the rootvg LUN on another system? Well... unfortunately it's not as simple as just remote mirroring the rootvg and booting off it at the DR site. This will work, but it's not a good idea, and I'm not sure if it's exactly supported.
For the non-rootvg disks that is simple, you can just replicate them via your storage system, and in a DR map the target volumes (or a snapshot of them) to the DR server, run cfgmgr, varyon the volume group, mount the filesystems and that should be it.
In the case of the rootvg, it's a bad idea to simply mirror the rootvg, and boot from it on another system. This will cause all your devices (everything in /dev) to be messed up, it will come up with the same IP address as the production system (could cause issues in a DR test).
So there are two options from here (excluding PowerHA):
1) Have a standby LPAR at the DR site running, and use something like rsync or scp to copy configuration files to the DR system, and mount the DR copy of the disks in a DR or DR test as described above and you are up and running. There would be a few manual steps during the DR such as varying on the volume groups and mounting the filesystems.
2) Clone the rootvg using alt_disk_copy on the production system to another LUN, remote mirror the clone LUN, then boot form the clone LUN in a DR or DR test.
I really like the alt_disk_copy method, because you have the option of using the -x flag where you can specify a script to run on the first boot of the system at the DR site.
This means that you could have the clone rootvg disk and your non-rootg mapped to an LPAR at the DR site which is shut down, and in a DR test or real DR you can just activate it and it's ready to go. You could have non-production LPARs running on the DR machine, and in a real DR just move resources away from the non-productive LPARs and assign them to production, and for a DR test, have minimal resources assigned to the DR LPAR or LPARs. Having two LPAR partition profiles could come in handy here.
So.. how do we do the copy?
There are three steps here:
- Create our script we want to run when the DR LPAR first boots.
First off lets say we want to have our script make some changes such as an IP address change on boot, so lets put that in our script which is /etc
Ensure that it is executable, # chmod 700 /etc
mktcpip -h <hostname> -a <IP to use in DR> -m <network mask> -i en0 -g <gateway IP>\n";
<insert other operating system modification commands here as required>
Other modifications you would include, are setting queue depths on hdisk devices, FC adapters, and any other operating system tunables you want to ensure are set (which may be lost as part of the alt disk copy, since we are doing a device reset, see the -O option) and possibly a reboot at the end.
Next lets say we want to do an alt disk copy to hdisk1 (a spare LUN we have in our machine, which is remote mirrored to the other site). We would boot off this disk in a DR. You would need to ensure that hdisk1 (the altinst_rootvg LUN) is never added to a non-rootvg volume group.
The command we would use here is:
# alt_disk_copy -P all -B -O -x /etc
The options here are:
-P = Run through all three phases of the alt disk copy.
Once that is completed the next step is to remove the altinst_rootvg from the AIX ODM:
# alt_rootvg_op -X altinst_rootvg
This could be easily scripted, and run nightly to do the alt disk copy, generate the dr_firstboot.sh script and then remove the clone from the ODM.
If the clone and script generation was automated, and the target volumes where mapped to the target LPAR a DR test could be as simple as activating the DR LPAR in a DR or DR test, then shutting it down again on test completion, and since the IP address and any other required settings can be changed, the test can be done in isolation.
This is also a neat trick if you are booting from SAN, and don't have a NIM server to restore a mksysb of the production system to a DR system in a migration to new hardware.