Performing an HADR failover operation
When you want the current standby database to become the new primary database because the current primary database is not available, you can perform a forced takeover, or failover.
Before you begin
About this task
The TAKEOVER HADR command with the BY FORCE can only be issued on the standby database. In Db2® pureScale® environments, you can issue the command from any member in the standby cluster, including non-replay members.
- A disabling message is sent to the primary, if it is connected.
- After it receives the disabling message, the primary database is shut down and log writing is stopped.
- Log shipping and log retrieval is stopped on the standby, which entails a risk of data loss.
- All received logs (that is, the logs that are stored in the log path) are replayed on the standby.
- Any open transactions are rolled back on the standby.
- The standby's role changes to primary and the new primary database is opened for client connections.
This procedure might cause a loss of data. Review the following information before performing this emergency procedure:
- Ensure that the primary database is no longer processing database
transactions. If the primary database is still running, but cannot
communicate with the standby database, executing a forced takeover
operation (issuing the TAKEOVER HADR command with
the BY FORCE option) could result in two primary
databases. When there are two primary databases, each database will
have different data, and the two databases can no longer be automatically
synchronized.
- Deactivate the primary database or stop its instance, if possible. (This might not be possible if the primary system has hung, crashed, or is otherwise inaccessible.) After a failover operation is performed, if the failed database is later restarted, it will not automatically assume the role of primary database.
- The likelihood and extent of transaction loss depends on your
specific configuration and circumstances:
- If the primary database fails while in peer state or disconnected peer state and the synchronization mode is synchronous (SYNC), the standby database does not lose transactions that were reported committed to an application before the primary database failed.
- If the primary database fails while in peer state or disconnected peer state and the synchronization mode is near synchronous (NEARSYNC), the standby database can only lose transactions committed by the primary database if both the primary and the standby databases fail at the same time.
- If the primary database fails while in peer state
or disconnected peer state and the synchronization mode is asynchronous
(ASYNC), the standby database can lose transactions committed by the
primary database if the standby database did not receive all of the
log records for the transactions before the takeover operation was
performed. The standby database can also lose transactions committed
by the primary database if the standby database crashes before it
was able to write all the received logs to disk. Note: Peer window is not allowed in ASYNC mode, therefore the primary database can never enter disconnected peer state in that mode.
- If the primary database fails while in remote
catchup state and the synchronization mode is super asynchronous (SUPERASYNC),
the standby database can lose transactions committed by the primary
database if the standby database did not receive all of the log records
for the transactions before the takeover operation was performed.
The standby database can also lose transactions committed by the primary
database if the standby database crashes before it was able to write
all the received logs to disk. Note: Databases can never be in peer or disconnected peer state in SUPERASYNC mode.
- If the primary database fails while in remote catchup pending
state, transactions that have not been received and processed by the
standby database are lost. Note: Any log gap shown in the database snapshot represents the gap at the last time the primary and standby databases were communicating with each other; the primary database might have processed a very large number of transactions since that time.
- Ensure that any application that connects to the new primary (or
that is rerouted to the new primary by client reroute), is prepared
to handle the following:
- There is data loss during failover. The new primary does not have all of the transactions committed on the old primary. This can happen even when the hadr_syncmode configuration parameter is set to SYNC. Because an HADR standby applies logs sequentially, you can assume that if a transaction in an SQL session is committed on the new primary, all previous transactions in the same session have also been committed on the new primary. The commit sequence of transactions across multiple sessions can be determined only with detailed analysis of the log stream.
- It is possible that a transaction can be issued to the original primary, committed on the original primary and replicated to the new primary (original standby), but not be reported as committed because the original primary crashed before it could report to the client that the transaction was committed. Any application you write should be able to handle that transactions issued to the original primary, but not reported as committed on the original primary, are committed on the new primary (original standby).
- Some operations are not replicated, such as changes to database configuration and to external UDF objects.
- HADR does not interface with the Db2 fault monitor (db2fm) which can be used to automatically restart a failed database. If the fault monitor is enabled, you should be aware of possible fault monitor action on a presumably failed primary database.
Procedure
To fail over the primary role to the standby:
Results
- starting the failed primary as a standby (that is, reintegrating it)
- starting the failed primary as a primary using the BY FORCE option
- stopping HADR on the failed primary
- dropping the failed primary database
- restoring the database
What to do next
If you want to reintegrate the old primary as the new standby, the old primary's log streams cannot have diverged from the new primary's. For more information on this procedure, see the Related links.