Question & Answer
In this article, we intend to explain and answer the outstanding queries around the backup process of databases in the IBM Intelligent Operations Center(IOC) High Availability (HA) system and the queries are as follows :
Q1. How could one test the failover of the IBM Tivoli System Automation (TSA) in IOC HA environment?
Q2. How should one do the start and stop of the IBM IOC databases in IOC 5.x HA environment?
Q3. How to does one do the backup of the database in IBM IOC 5.x HA environment?
Kindly note that this topic is beyond the scope of the IBM IOC Support team scope and we have created this document with the intent only to provide knowledge about possible ways to do a backup.
VMWare snapshot on shared disk is not possible due to VMWare limitation explained in the link.
"VMWare does not support snapshots of virtual machines configured for bus sharing. If you require bus sharing, consider running backup software in your guest operating system as an alternative solution. If your virtual machine currently has snapshots that prevent you from configuring bus sharing, delete (consolidate) the snapshots."
IOC HA solution is not fully based on HADR which maintains two copies of data which requires double of SAN disk space. Also running HADR is bit disk space expansive as (for IOC) it creates db2 logs which take disk space and may cause full disk issues. For the current IOC HA solution, the SANdisk needs to be shared between two nodes. Kindly review the IBM IOC HA knowledge centre carefully.
IOC HA is based on shared disk technology without need to 2nd disk (which could be very expansive) but rest of function is similar to HADR. We have two DB2 nodes running isolated RHEL 6.x filesystem from each other. IBM TSA handles failover between two DB nodes (check with lssam command). Each DB nodes has defined alternative DB server, you can check using db2 list database directory. Each APP/ANA node liberty profile server.xml has defined "clientRerouteAlternateServerName" to start connecting 2nd DB node in the event of failover after retrying primary DB which has failed.
In this IOC HA release we use TSA (IBM Tivoli System Automation) to manage DB2 resources by creating a domain in Active/Standby mode. You can use TSA commands like "lssam" to determine which node is currently active.
Note : It is always recommended to power up first DB node (IOCDATA1) which will become TSA active/online database node and then power up 2nd DB node (IOCDATA2), then run TSA lssam command to check status. TSA will handle failover and mount /SANdisk automatically when active/online DB node not available.
Q1. How could one test the failover of the TSA in IBM IOC HA environment?
a. To test failover (in scheduled maintenance window) run the following command on any of the database nodes :
As 'root' move the resource group “db2_db2i1own_0-rg” from the primary node to the standby node. In other words, perform a controlled failover using the following command :
rgreq -o move -n <online_server> db2_db2i1own_0-rg
b. Alternatively aggressive option (in scheduled maintenance window) you could shutdown or reboot the primary or active DB2 node and TSA will automatically failover to 2nd DB node. This is not recommended practice as the reboot or shutdown of the Database servers are not done at Database application level, but at operating system level and this could corrupt the file systems and should be only executed when there are no alternatives.
Note: It is recommended to have primary DB node up and running after troubleshooting / tests and move TSA resource back to primary. Also make sure /SANdisk1 is mounted only on primary DB node after a successful failover.
Q2. How should one do the DB start/stop IBM IOC databases in HA?
Before stopping DB2 process manually on online node without performing failover, ensure the DB2 resource will not switch to standby DB2 node by locking TSA resource. To lock DB2 TSA resource group as the 'root' user:
rgreq -o lock db2_db2i1own_0-rg
To stop and start DB2 process on active node in a database maintenance window as the 'db2i1own' user :
su - db2i1own
legacydb2stop force (stop DB2 process on active node)
legacydb2start (start DB2 process on active node)
Q3. How to does one do the backup of the IBM IOC 5.x HA environment?
a. IBM IOC Database recommended Backup and Recovery procedure:
IBM IOC recommends to follow middleware backup & recovery procedures. One critical component of IOC is DB2, which needs to be backup regularly according to customer's backup & recovery policies. The IBM standard DB2 backup and recovery guideline should be followed as per this IBM DB2 knowledge center article.
b. File-system backup procedure for IOC database node:
It is also recommend running the full /SANdisk1 file-system backup as well as above regularly, in DB2 offline mode on active DB node regularly explained in DB2 knowledge center above.
i. You must stop DB2 process on the active node. To stop the database on the active node - use the stop command outlined under the Q2 section
ii. After DB2 process is stopped using the 'root' user, backup /SANdisk1 file-system with the following tar command.
The /SANdisk1 should be mounted on the active database node and there should be enough space on the disk partition where you intend to keep backup tar file of /SANdisk1. Here is the example command :
tar -cvf SANdisk1_backup_IOC5xxx_before_upgrade_dd-mm-201y.tar /SANdisk1/*
iii. After successful backup restart DB2 process using the start command outlined in the Q2 section and it will automatically unlock TSA resource.
Note: You must get TSA resource group name using lssam command (by default IOC create "db2_db2i1own_0-rg" resource group). Always ensure that the /SANdisk is only mounted on active DB2 node, never mount on offline node or both nodes at same time.
Additionally these queries that are also clarified here are as follows -
a) Could the system be presumed/unmounted safely and a snapshot can be taken without affecting the DB2 HA functionality?
The shared disk already unmounted on standby node, if you need to unmount on primary disk you can use steps explained as seen above. The snapshot is not possible for shared disk clustered env as per VMWare limitation.
b) What is the default type of DB2 backups implemented in IOC and difference between online and offline backups?
By default, circular rather than archive logging is used for IBM Intelligent Operations Center databases and so database backups must be done when the database is offline. You will find a brief info in this IBM DB2 knowledge center link.
For online backups, DB2 documentation has to be consulted as mentioned in the above link to ensure it is done in the best way and doesn't impact IOC. If there are any issues as a result then it has to go to DB2 Support.
Intelligent Operations Center; IOC; Database;DB2
04 October 2019