TAKEOVER HADR command
The TAKEOVER HADR command instructs an HADR standby database to take over as the new HADR primary database for the HADR pair. This command is a cluster-wide command in a Db2® pureScale® environment, so you can issue it on any member on the standby, including nonreplayre members.
To run the command, one of the following authorities is needed:
Instance. The command establishes a database connection if one does not exist and closes the database connection when the command completes.
- DATABASE database-alias
- Identifies the current HADR standby database that is configured to take over as the HADR primary database.
- USER user-name
- Identifies the user name under which the takeover operation is
to be started.
- USING password
- The password used to authenticate user-name.
- BY FORCE
- Specifies that the database is not to wait for confirmation that the original HADR primary
database is shut down. Unless you are using SUPERASYNC synchronization
mode, this option is needed if the HADR pair is not in peer state.
- PEER WINDOW ONLY
- When this option is specified, no committed transaction loss occurs if the command succeeds and
the primary database is brought down before the end of the peer window period. You set the peer
window period by assigning a positive integer value to the hadr_peer_window
database configuration parameter. Not bringing down the primary database before the peer window
expires results in a split brain. If the TAKEOVER BY FORCE PEER WINDOW
ONLY command is run when the HADR pair is not in a peer state, or is in a disconnected
peer state, an error is returned.
You cannot use the PEER WINDOW ONLY option when the synchronization mode is set to ASYNC or SUPERASYNC.
- This form of takeover is also called graceful takeover or non-forced takeover. It is used for a
planned-outage scenario, such as completing hardware or software maintenance in a rolling fashion,
one host at a time. The non-forced takeover can be used only when standby is in a PEER state, or in
a REMOTE_CATCHUP state under the following conditions:
- The HADR synchronization mode is SUPERASYNC.
- In a Db2 pureScale environment, a stream is in assisted remote catchup, regardless of the synchronization mode.
- The use of non-forced takeover never leads to data loss.
- The successful completion of a non-forced takeover changes the role of the original primary database to standby, and the role of the original standby database to primary. The original standby database is the database to which the TAKEOVER command was issued.
- Failure of the original primary or standby host, the Db2 instance, or the network, can prevent
the successful completion of a TAKEOVER operation without the BY
FORCE option. The time when an error occurs has a direct impact on the
TAKEOVER operation's stage of completion:
- The roles of the original primary and standby databases are not changed. In this case, it is as if the TAKEOVER command was never issued.
- The role of the original primary database is changed to standby, but the role of the original
standby database (to which the TAKEOVER command was issued) is not yet changed.
In this scenario, all databases have the role of standby. You need to issue a TAKEOVER BY
FORCE command, either on the original primary database, or the original standby database,
to make it the new primary and continue. Note: In this scenario, continuing with the TAKEOVER BY FORCE command does not cause data loss.
- The takeover operation is completed successfully, but the host or Db2 instance fails before the successful completion of the command is returned. Success means that the role of the original primary database is changed to standby and the role of the original standby database is changed to primary. In this situation, you can continue as normal.
- This form of takeover is also called forced takeover. It is used for unplanned-outage scenarios, where the applications are no longer able to run on the original primary database, due to hardware, software, or network failure. For more information, see Performing an HADR failover operation.
- The use of a forced takeover can lead to data loss, so use it only when the primary database
fails unexpectedly. Data loss can result from any of the following situations:
- When the standby is not in PEER state when the primary fails. Transaction log data that is produced by the primary is not shipped to the standby.
- When the standby is in PEER state during a network failure. The primary database continues processing, either because the hadr_peer_window database configuration parameter is not set, or it is set but the primary database remains active past the time when the peer window expires.
- A forced takeover stops log shipping or log retrieval on the standby. Log replay continues to the end of the received or retrieved logs.
- When the standby is connected to the primary during a forced takeover, it sends a disabling
message to the old primary to prevent a split-brain scenario with dual primaries. The primary
receiving the disabling message ends itself. Any subsequent attempt to use the old primary database
fails with SQL1776N reason code 6. Note: When you are running Db2 pureScale in an HADR configuration, this scenario applies when the standby is connected to any member on the primary during the forced takeover operation.
- The successful completion of a forced takeover changes only the role of the original standby database (against which the TAKEOVER command was issued) to primary. It does not change the role of the original primary database. A subsequent step is needed to change the original primary database into a new standby database, either by reintegration or reinitialization. Reintegration is only possible if no data loss occurs. When reintegration fails, the only option is to reinitialize.
- The TAKEOVER BY FORCE PEER WINDOW ONLY command is typically used by an
automated failover feature that is able to detect the outage of the primary database. The feature is
able to issue the command to initiate the failover before the peer window expires.
Note: The TAKEOVER BY FORCE PEER WINDOW ONLY command is rejected when the peer window expires. This expiration occurs when the command is issued as the HADR state changes from DISCONNECTED_PEER to REMOTE_CATCHUP_PENDING. The rejection of the command is an indication that the automated failover feature is not able to initiate the failover fast enough. In this situation, user intervention is needed. If the failure of the original primary database is still not resolved, a manual failover (that uses a forced takeover without the PEER WINDOW ONLY option) can be used to proceed. Try setting a higher value for the hadr_peer_window database configuration parameter to ensure successful automated failover in the future.Note: The TAKEOVER BY FORCE PEER WINDOW ONLY command can behave incorrectly if the primary database clock and the standby database clock are not synchronized to within 5 seconds of each other. That is, the operation might succeed when it should fail, or fail when it should succeed. You should use a time synchronization service (for example, NTP) to keep the clocks synchronized to the same source.
- The TAKEOVER BY FORCE PEER WINDOW ONLY command proceeds when the standby is in PEER or DISCONNECTED_PEER state. These conditions ensure data consistency between the primary and the standby database. If the standby is not in PEER or DISCONNECTED_PEER state, the command fails with SQL1770N reason code 9.
- The HADR synchronization mode must be SYNC or NEARSYNC, and the hadr_peer_window database configuration parameter must be configured.
When you run the TAKEOVER HADR command, the corresponding error codes might be generated: SQL1767N, SQL1769N, or SQL1770N with a reason code of 98. The reason code indicates that no installed license exists for HADR on the server where the command was issued. To correct the problem, install a valid HADR license by using the db2licm command, or install a version of the server that contains a valid HADR license as part of its distribution.
- If you do not have an immediate need to connect to the standby database, wait for the UPGRADE DATABASE command to complete on the primary database. Also, wait for the standby database to replay all of the upgrade log records that were sent from the primary database. When these two processes are complete, you can then reissue the command.
- If you need to connect to this standby database immediately, issue the STOP HADR command to turn the HADR role to STANDARD.
- Takeover and reads on standby
If you have reads on standby enabled, any user application that is connected to the standby is disconnected to allow the takeover to proceed. Depending on the number of readers that are active on the standby, the takeover operation can take slightly longer to complete than when no readers are active. New connections are not allowed during the role switch. Any attempt to connect to the HADR standby during the takeover role switch generates an error(SQL1776N).
- Takeover and log replay gap
- Before you start a takeover operation, it is important to check the gap between the primary log
position and the standby log replay position. These values are found in the
primary_log_pos and standby_replay_log_pos monitor
elements. A large gap implies that the TAKEOVER command will take a long time to
complete. While the TAKEOVER command is running, applications cannot access the
database. It is important to note this access restriction when you are planning for the non-forced
takeover operation. It might be more suitable to complete the takeover operation at a different
time, when the gap is smaller. Note: The gap between the primary log position and the standby log replay position is a number that represents the total size of log data that needs to be processed. You can see this gap more clearly by noting the difference between the primary log time and standby replay log time. These timestamp values are found in the primary_log_time and standby_replay_log_time monitor elements.
- Takeover and log spooling
If you are using a high value for hadr_spool_limit, consider that a large gap between the log position of the primary and log replay on the standby can lead to a longer takeover time. This longer duration is caused by the standby not assuming the role of the new standby until the replay of the spooled logs finishes.
- Takeover and delayed replay
If you configure hadr_replay_delay to a non-zero value, you cannot issue the command on the standby (SQL1770N).
- STOP HADR and TAKEOVER BY FORCE PEER WINDOW ONLY
- In a network failure scenario with an hadr_peer_window database configuration, transactions on the primary are blocked when the primary is unable to ship the logs to the standby. By running the STOP HADR command on the primary, you can unblock the transactions, while the standby remains in DISCONNECTED_PEER state. This action enables the TAKEOVER BY FORCE PEER WINDOW ONLY operation to proceed, but data loss can result. The original standby database becomes a new primary without the transactions that were unblocked on the old primary.
- Takeover in a Db2 pureScale environment
The following considerations apply to Db2 pureScale environments:
- All log streams must pass the check to allow a takeover command to proceed. However, the streams do not need to be in the same state.
- When the role of a database changes from primary to standby, a member that has a direct connection to the old standby is chosen as the replay member. Preference is given to the preferred replay member. The preferred member is not chosen if it has no direct connection to the standby. Non-replay members are deactivated.
- When the role of a database changes from standby to primary, only the old replay member stays active; other members on the new primary are not activated.
- A non-forced takeover is not allowed if any member on the primary is in member crash recovery (MCR) pending state, or in progress state.
- A non-forced takeover is not allowed if the primary database is in group crash recovery state, as the streams cannot be in the needed state.
- Takeover and Advanced Log Space Management
If configured to use Advanced Log Space Management (ALSM), takeover or forced takeover of active and extraction log files is automatically manged during takeover. ALSM helps physical disk space consumption but does not impact takeover time. A takeover can use extraction log files if active log files have been recycled. Do not remove or modify extraction log files as this can impact the takeover process.
Use a shared archive between the primary and standby databases. In the rare case that an extraction log file is unusable, the previously archived log file is retrieved from the archives and used for takeover. If the archived log file is not accessible, the operation fails and manual user intervention is needed.