Recovering IIDR for Oracle after a database failover operation

CDC Replication requires a specific procedure in order to recover from an Oracle database failover operation.

About this task

In a disaster recovery configuration, users often set up their environments using a primary database and a physical standby database connected through DataGuard services. IIDR would normally be configured to replicate data from the primary database.

These databases have two mutually exclusive roles: primary and standby. The roles could be interchanged, which is known as role transition. Role transitions can happen due to planned transitions or as a result of a database failure.

There are two main scenarios:
  • Planned Switchover: this is a planned operation in which the primary database and the standby database change roles. A switchover guarantees no data loss. This planned operation occurs without having to re-instantiate either of the databases.
  • Failover: this happens when the primary database fails, becomes unreachable and cannot be recovered in a timely manner. Failover might or might not result in data loss, depending on the data protection mode in use at the time of the failover. This type of transition requires a re-instantiation of the newly activated database.

A planned switchover is comprised of a series of steps that users would follow in order to switchover their databases. IIDR for Oracle should be taken into consideration as part of that plan.

The new functionality added to IIDR 11.3.3 for Oracle provides a solution for some of the unplanned failover cases. This new functionality does not support automatic failover. The failover procedure will still be manual. Implementing the manual failover procedure will result in the following configuration:

In the new configuration, IIDR will now be replicating from the newly primary database (B). The procedure described from now on does not describe how to move IIDR for Oracle from one machine to the other. It assumes that the software is ready to start replication from the newly primary database (B).

As described above, a failover operation requires a re-instantiation of the newly primary database (B). Opening the newly primary database using the RESETLOG option re-instantiates the database. A RESETLOG operation does the following:
  • Archives the current online redo logs if they are accessible
  • Erases the contents of the online redo logs
  • Resets the log sequence number to 1
  • Creates the online redo log files if they do not currently exist
  • Updates all current datafiles and online redo logs and all subsequent archived redo logs with a new RESETLOGS SCN and time stamp

The RESETLOG operation creates a new incarnation of the database. If IIDR for Oracle were to be started after the RESETLOG operation was executed, replication would fail since IIDR could only read logs from the current incarnation of the database.

In order to allow IIDR 11.3.3 to resume replication, IIDR must be run in recovery mode. In recovery mode, IIDR will read logs from the previous incarnation of the database. IIDR 11.3.3 for Oracle introduces a new command line utility to implement this recovery mode.

The dmfailoverrecovery command enables IIDR 11.3.3 for Oracle to read logs from the previous incarnation of the database until all required logs have been processed. If the recovery step finishes successfully, IIDR 11.3.3 for Oracle will resume normal replication of the new database incarnation.

The dmfailoverrecovery command starts replication for all configured subscriptions and mirrors data until all logs from the previous incarnation are processed and the last available SCN on the previous incarnation is reached.

To continue replication subsequent to a database failover, perform the following procedure after executing the manual failover procedure and configuring IIDR 11.3.3 for Oracle with the newly primary database (B):

Procedure

  1. Run the dmfailoverrecovey command with -d option. The -d option enables the user to validate the parameters that IIDR will use during the recovery. The output of the command will show the current and previous reset SCN.
    CDC_INSTALL_HOME>/bin>./dmfailoverrecovery -I MYINSTANCE -d
    Failover recovery will run for all configured subscriptions up to SCN <scn number>. Please re-run this command using the option to start
    failover recovery. Active Reset SCN is: <scn number> , Previous Reset SCN is:  Previous Reset SCN is: <scn number>. 
  2. IIDR will be unable to perform a full recovery if the output of step 1 does not match expected values. In this case, a refresh operation will be required for all tables.
  3. Attempting to resume replication without running the dmfailoverrecovery command will result in failure accompanied by the following error in the event log:
    Archived log corresponding to SCN position {0} does not belong to the current database incarnation (it may belongs to previous DB 
    Incarnation). Current DB Incarnation is {1}. Log position is {2}. If you performed a failover onto a physical standby database, 
    you might need to run the failover recovery procedure running command line dmfailoverrecovery. If not, please contact IBM 
    support for more Information.
  4. After validating the information given by the command in step 1, run the command with the -r option. The –r option starts the recovery process. Note that the command will start replication for all subscriptions. Recovery time will depend on the size and number of database logs that IIDR for Oracle will need to process. All subscriptions will end replication when the recovery process is complete
    <CDC_INSTALL_HOME>/bin>./dmfailoverrecovery -I MYINSTANCE -r
    Failover Recovery has been completed successfully from previous reset SCN 144265867
  5. Validate that all subscriptions ended replication gracefully. Subscriptions that do not end gracefully will need to be refreshed.
  6. Once the recovery process ends, the user can resume normal replication.