Active-passive disaster recovery

This section describes how to use Cloud Pak for Data System version 1.0.7.8 and higher with an active-passive full Disaster Recovery (DR) setup.

This configuration entails that the production site is ready for ongoing production use, while the DR site is in standby, but online and available for use. The DR system cannot take backups to the storage area network (SAN) at the DR site because the storage-level replication you provide overwrites the local SAN at the DR site with the contents of the SAN at the production site.

To use the active-passive Disaster Recovery (DR) functionality, the Cloud Pak for Data System deployments (at both the production site and DR sites) must include one or more connector nodes. You must have at least two connector nodes per site to retain access to the SAN-based backups if one of the connector nodes on the system experiences an unforeseen outage. For more on connector nodes, see Connector node.

At both the production and DR sites, you must provide and administer a Fibre Channel (FC) storage area network (SAN) that is wired to a dedicated RAIDed persistent storage, and a storage-level replication software that can copy the contents of the storage from one site to the other site.

A third-party-provided storage-level replication solution that you provide can operate between the production and DR sites, which you configure, administer, and maintain ongoing. This usually comes from the vendor that supplies the RAIDed storage systems that are used at both sites, although not necessarily.

Note: The DR system is only a passive standby in this configuration.
The storage-level replication that you provide and administer keeps the DR site's SAN storage volumes up to date with prod by one of the following.
  • Synchronously
  • Asynchronously
  • Regularly
  • On-demand
When the production site experiences a disaster, you must activate the DR site. This involves:
  • Stopping the DR system.
  • Importing the SAN file system created at the production site that you had already been replicating to the DR site SAN.
  • Starting the DR system.
For more details, see the Failover to DR site section.
The DR system becomes active within an hour, depending on the response time, length of storage admin tasks, and size of the NPS® deployment.
Note: The nzstop and nzstart times can vary depending on the size of the deployment or amount of data in the database.
From there, run nzrestore on the DR system from the backups. The nzrestore times depend on the size of the database and must be considered for estimating the time to return the project to production status at the DR site in response to a disaster.