Question & Answer
How to do a backup of the IBM Intelligent Operations Center (IOC) 5.x.x.x High Availability (HA) environment?
Kindly note that this topic is beyond the scope of the IBM IOC Support team scope and we have created this document with the intent only to provide knowledge about possible ways to do a backup.
This also means that the potential ways to do a backup as outlined below are not supported by IBM - any questions or problems with implementations of outlined recommendations will be outside of the IBM support scope.
VMWare snapshot on shared disk is not possible due to VMWare limitation explained in the link.
"VMWare does not support snapshots of virtual machines configured for bus sharing. If you require bus sharing, consider running backup software in your guest operating system as an alternative solution. If your virtual machine currently has snapshots that prevent you from configuring bus sharing, delete (consolidate) the snapshots."
IOC HA solution is not fully based on HADR which maintains two copies of data which requires double of SAN disk space. Also running HADR is bit disk space expansive as (for IOC) it creates db2 logs which take disk space and may cause full disk issues. For the current IOC HA solution, the SANdisk needs to be shared between two nodes. Kindly review the IBM IOC HA knowledge centre carefully.
IOC HA is based on shared disk technology without need to 2nd disk (which could be very expansive) but rest of function is similar to HADR. We have two DB2 nodes running isolated RHEL 6.x filesystem from each other. IBM TSA handles failover between two DB nodes (check with lssam command). Each DB nodes has defined alternative DB server, you can check using db2 list database directory. Each APP/ANA node liberty profile server.xml has defined "clientRerouteAlternateServerName" to start connecting 2nd DB node in the event of failover after retrying primary DB which has failed.
1) In this IOC HA release we use TSA (IBM Tivoli System Automation) to manage DB2 resources by creating a domain in Active/Standby mode. You can use TSA commands like "lssam" to determine which node is currently active.
2) It is always recommended to power up first DB node (IOCDATA1) which will become TSA active/online database node and then power up 2nd DB node (IOCDATA2). run TSA lssam command to check status. TSA will handle failover and mount /SANdisk automatically when active/online DB node not available.
To test failover (in scheduled maintenance window) run the following command:
As 'root' move the resource group “db2_db2i1own_0-rg” from the primary node to the standby node. In other words, perform a controlled failover using the following command :
rgreq -o move -n <online_server> db2_db2i1own_0-rg
Or (in scheduled maintenance window) you can shutdown primary DB2 node and TSA will automatically failover to 2nd DB node.
Note: it is recommended to have primary DB node up and running after troubleshooting / tests and move TSA resource back to primary.
IOC recommend to follow middleware backup & recovery procedures. One critical component of IOC is DB2, which needs to be backup regularly according to customer's backup & recovery policies. IBM standard DB2 backup and recovery guideline can be reviewed from this knowledge center link.
We would also recommend running full /SANdisk1 filesystem backup as well as above regularly, in DB2 offline mode on Active DB node.
You can stop DB2 in IOC HA environment by locking TSA resource, which will make sure DB2 resource will not switch to standby DB2 node. To lock DB2 TSA resource group:
rgreq -o lock db2_db2i1own_0-rg
Note: you can get TSA resource group name using lssam command (by default IOC create "db2_db2i1own_0-rg" resource group)
To stop and start DB2 process on active node:
su - db2i1own
1) RUN all backup commands as db2i1own as per knowledge center
2) Also as root backup /SANdisk filesystem using tar command, example :
tar -cvf SANdisk1_backup_IOC5xxx_before
3) After successful backup start DB2 process using following command and it will automatically unlock TSA resource.
Note: make sure /SANdisk is only mounted on active DB2 node, never mount on offline node or both nodes at same time.
Here are certain queries which we got clarification -
a) Could the system be presumed/unmounted safely and a snapshot can be taken without affecting the DB2 HA functionality
The shared disk already unmounted on standby node, if you need to unmount on primary disk you can use steps explained as seen above. The snapshot is not possible for shared disk clustered env as per VMWare limitation.
b) What is the default type of DB2 backups implemented in IOC and difference between online and offline backups?
By default, circular rather than archive logging is used for IBM Intelligent Operations Center databases and so database backups must be done when the database is offline. You will find a brief info in this knowledge center link.
For online backups, DB2 documentation has to be consulted as mentioned in the above link to ensure it is done in the best way and doesn't impact IOC. If there are any issues as a result then it has to go to DB2 Support.
Intelligent Operations Center; IOC; Database;DB2
20 February 2019