When you want the current standby database to become the
new primary database because the current primary database is not available,
you can perform a failover.
About this task
Warning: This procedure
might cause a loss of data. Review the following information before
performing this emergency procedure:
- Ensure that the primary database is no longer processing database
transactions. If the primary database is still running, but cannot
communicate with the standby database, executing a forced takeover
operation (issuing the TAKEOVER HADR command with
the BY FORCE option) could result in two primary
databases. When there are two primary databases, each database will
have different data, and the two databases can no longer be automatically
synchronized.
- Deactivate the primary database or stop its instance, if possible.
(This might not be possible if the primary system has hung, crashed,
or is otherwise inaccessible.) After a takeover operation is performed,
if the failed database is later restarted, it will not automatically
assume the role of primary database.
- The likelihood and extent of transaction loss depends on your
specific configuration and circumstances:
- If the primary database fails while in peer state or disconnected
peer state and the synchronization mode is synchronous (SYNC), the
standby database will not lose transactions that were reported committed
to an application before the primary database failed.
- If the primary database fails while in peer state or disconnected
peer state and the synchronization mode is near synchronous (NEARSYNC),
the standby database can only lose transactions committed by the primary
database if both the primary and the standby databases fail at the
same time.
- If the primary database fails while in peer state
or disconnected peer state and the synchronization mode is asynchronous
(ASYNC), the standby database can lose transactions committed by the
primary database if the standby database did not receive all of the
log records for the transactions before the takeover operation was
performed. The standby database can also lose transactions committed
by the primary database if the standby database crashes before it
was able to write all the received logs to disk.
Note: Peer window
is not allowed in ASYNC mode, therefore the primary database will
never enter disconnected peer state in that mode.
- If the primary database fails while in remote
catchup state and the synchronization mode is super asynchronous (SUPERASYNC),
the standby database can lose transactions committed by the primary
database if the standby database did not receive all of the log records
for the transactions before the takeover operation was performed.
The standby database can also lose transactions committed by the primary
database if the standby database crashes before it was able to write
all the received logs to disk.
Note: Databases can never be in peer
or disconnected peer state in SUPERASYNC mode. Failover (forced takeover)
is allowed in remote catchup state only if the synchronization mode
is SUPERASYNC.
- If the primary database fails while in remote catchup pending
state, transactions that have not been received and processed by the
standby database will be lost.
Note: Any log gap shown in the database
snapshot will represent the gap at the last time the primary and standby
databases were communicating with each other; the primary database
might have processed a very large number of transactions since that
time.
- Ensure that any application that connects to the new primary (or
that is rerouted to the new primary by client reroute), is prepared
to handle the following:
- There is data loss during failover. The new primary does not have
all of the transactions committed on the old primary. This can happen
even when the hadr_syncmode configuration parameter
is set to SYNC. Because an HADR standby applies logs
sequentially, you can assume that if a transaction in an SQL session
is committed on the new primary, all previous transactions in the
same session have also been committed on the new primary. The commit
sequence of transactions across multiple sessions can be determined
only with detailed analysis of the log stream.
- It is possible that a transaction can be issued to the original
primary, committed on the original primary and replicated to the new
primary (original standby), but not be reported as committed because
the original primary crashed before it could report to the client
that the transaction was committed. Any application you write should
be able to handle that transactions issued to the original primary,
but not reported as committed on the original primary, are committed
on the new primary (original standby).
- Some operations are not replicated, such as changes to database
configuration and to external UDF objects.
- The TAKEOVER HADR command can only be issued
on the standby database.
- HADR does not interface with the DB2® fault
monitor (db2fm) which can be used to automatically restart a failed
database. If the fault monitor is enabled, you should be aware of
possible fault monitor action on a presumably failed primary database.
- A takeover operation can only take place if the primary and standby
databases are in peer state or the standby database is in remote catchup
pending state. If the standby database is in any other state, an error
will be returned.
Note: You can make a standby database that is
in local catchup state available for normal use by converting it to
a standard database. To do this, shut the database down by issuing
the DEACTIVATE DATABASE command, and then issue
the STOP HADR command. Once HADR has been stopped,
you must complete a rollforward operation on the former standby database
before it can be used. A database cannot rejoin an HADR pair after
it has been converted from a standby database to a standard database.
To restart HADR on the two servers, follow the procedure for initializing
HADR.
If you have configured a peer window, shut down the primary
before the window expires to avoid potential transaction loss in any
related failover.
In a failover scenario, a takeover
operation can be performed through the command line processor (CLP), or the db2HADRTakeover application
programming interface (API).
Procedure
The following procedure shows you how to initiate a failover
on the primary or standby database using the CLP:
- Completely disable the failed primary database. When a
database encounters internal errors, normal shutdown commands might
not completely shut it down. You might need to use operating system
commands to remove resources such as processes, shared memory, or
network connections.
- Issue the TAKEOVER HADR command with
the BY FORCE option on the standby database. In the following example the failover takes place on database
LEAFS:
TAKEOVER HADR ON DB LEAFS BY FORCE
The BY FORCE option is required because
the primary is expected to be offline.
If the primary database
is not completely disabled, the standby database will still have a
connection to the primary and will send a message to the primary database
asking it to shutdown. The standby database will still switch to the
role of primary database whether or not it receives confirmation from
that the primary database has been shutdown.