Setting up replication for continuous availability

You can use the replication console to configure primary and standby database sites so that if the primary site must be shut down (or has an unplanned outage), applications can be re-routed to the standby site until the primary is restored.

Before you begin

Before you configure replication, create two unique replication user IDs on the primary site and standby site by using the web console as an administrator. Each ID represents one direction of replication in the two-way configuration that is needed for a failover setup:

repluser1: Create this ID on the primary server and use it to enable the source database at the primary server, and the target database at the standby server that will receive replicated transactions from the primary server.
repluser2: Create this ID on the standby server and use it to enable the source database at the standby server, and the target database at the primary server that will receive replicated transactions from the standby server.

The ID that you use when enabling the target is inserted into a metadata table at the source called IBMQREP_IGNTRAN. All changes that are made by this user ID are then ignored by the replication capture process to avoid "circular" replication where the same changes are continually replicated back and forth to the two sites.

These user IDs should be reserved exclusively for replication. The replication user IDs should not be used to manually perform any database operations, including inserting replication signals.

About this task

The steps in this topic describe a planned failover and failback scenario. For additional considerations for unplanned or disaster scenarios, see below.

Procedure

Use the web console to create a Replication Set from the primary server to the standby server.
Be sure to specify the option Load all tables in the replication set when the set is started.
Tip: Use a Replication Set name whose first eight characters are unique.
Create a Replication Set from the standby site to the primary site.
Do not specify the option Load all tables in the replication set when the set is started.
Start replication in both directions and use the web console to verify that the Replication Sets are active in both directions.
Use the web console to verify that all tables within the Replication Sets are active in both directions.
On the monitoring page for the Replication Set from the primary server to the standby server, note the last consistency point.

The last consistency point is in the target time in UTC. Adjust to local time on the source if needed.
Note the time at the primary server.
Quiesce applications on the primary server.
Note the time on the primary server after applications are quiesced.
Using the web console, wait for the last consistency point (adjusted to source local time if necessary) to be past the application quiesce time.
Start the applications on the standby site.
At this point, the two sites have reversed roles (the former standby site is now the primary site and the former primary site is now the standby site).
When you are ready to switch your workloads back to the primary site, use the web console to verify that the Replication Sets are active in both directions.
Use the web console to verify that all tables within the Replication Sets are active in both directions.
Note the last consistency point for the Replication Set from the primary (formerly standby) site. Adjust to local time if needed.
Note the time on the standby (formerly primary) site.
Quiesce applications on the new primary site.
Note the time on the primary site after applications are quiesced.
Using the web console, wait for the last consistency point (adjusted to target local time if necessary) to be past the application quiesce time.
Start applications on the standby site, making it once again the primary site.

What to do next

In the event of an unplanned failure at the primary site, the last consistency point for the Replication Set from the primary site to the standby site will be older than the time that applications last updated the primary site, because replication has not caught up with these new updates.

When you switch application to the standby site, the last consistency point for the Replication Set will be newer than or equal to the time that the applications last updated the primary site.

After the failover, new data starts flowing from the new primary site (former standby site) to the new standby site (former primary site).

Replication in the reverse direction, from the new standby site to the new primary site, will catch up with the changes that were made on the original primary site before the failover. Some conflicts are possible if the same rows were updated during this catchup time period. Row conflicts are logged in the IBMQREP_EXCEPTIONS table in the target database, and you can use an SQL SELECT statement to view this table:

SELECT * FROM schema.IBMQREP_EXCEPTIONS ORDER BY EXCEPTION_TIME DESC;