
Disaster recovery
IBM® MQ for HPE NonStop V8.1 can be configured to work on an active/passive disaster recovery (DR) configuration based on audit trail replication.
When you suffer a complete outage at your data center, the work can be resumed by a different HPE NonStop system at a distant location. The instructions given here are for configuring IBM MQ for HPE NonStop V8.1 to work with a DR configuration based on audit trail replication. A queue manager created on the active node of an active/passive DR setup will be automatically usable on the recovery node in case of a failure.
IBM MQ for HPE NonStop V8.1 uses audited files for queue data and most configuration data (such as channel or cluster configurations). Information about the creation and deletion of queue managers requires additional tools that replicate OSS trees to the recovery site.
- The contents of the mqs.ini file is copied to an audited Enscribe file. When a queue manager on the recovery site is started and the mqs.ini file is not found, the file is automatically recreated by IBM MQ.
- The OSS filesystem directory subtree required to start a queue manager is put into an audited Enscribe file automatically and replicated. If IBM MQ on the recovery site finds that this subtree is not present, or if user configurable files in the subtree are older than in the Enscribe file, IBM MQ automatically creates or refreshes this subtree. (The user configurable files are the queue manager specific .ini files and possibly ssl certificates in the queue manager specific ssl subdirectory.)
Configuring IBM MQ for disaster recovery
To configure audited replication based disaster recovery, complete the following steps:
- Make sure that the OSS coreutils are installed. This is a requirement for the replication feature to work.
- Install IBM MQ for HPE NonStop V8.1 or later on all DR nodes. Make sure that all members of the mqm group have the permissions to execute the tar command in coreutils. Note that tar in coreutils has a different feature set than the standard tar. These additional features are needed by IBM MQ.
- Configure your replication solution to replicate all IBM MQ audited data. Disk and subvolume names on the replication targets (that is, on the recovery nodes) must be the same as on the source (active) node.
- If you edit one of the queue manager specific .ini files, or update, change, or renew one of the certificate and stash files in the queue manager-specific ssl subdirectory, the replication mechanism automatically transports that data to the recovery node. The interval between checking for changes and replicating can be configured by using runnscnf (class = QueueManager, object = CurrentQmgr, property = AuditRefreshInterval in seconds, see Class QueueManager). A refresh only occurs if a change is detected, so the overhead of this function is small.
If you are testing your disaster recovery solution and want to fail over to the recovery node, make sure that the queue manager on the active node is not running anymore. You can then start the queue manager on the disaster recovery node immediately without performing any configuration or creating OSS directories. The necessary steps are performed automatically by the product. Provided you have set a refresh interval by using runnscnf (see step 4) your .ini files and ssl certificates will be up to date when the queue manager is started.
- If you use asynchronous replication technology, there may be a backlog of audit trail data which was not replicated to the recovery node before a fail over. In this situation, you might lose some message data.
- Do not manually change any .ini or ssl files on the recovery system. If IBM MQ on the recovery node detects that any of the .ini or ssl files is more recent than the most recent file from the active system when starting a queue manager, it will not refresh these files from the replicated archive. This can cause unwanted effects.
- If you manage the ssl files on the recovery node manually (for example, because you are using different certificates in the backup node), you can prevent IBM MQ from overriding these certificates with those from the primary node. Configure this feature by using runnscnf (class = QueueManager, object = CurrentQmgr, property = RecoverSSLFiles, setting this value to F will preserve SSL files stored on the recovery node, see Class QueueManager).
- Channel synchronization is not covered by audited files, so you need to reset channels after a fail over to the recovery node occurs.