Dealing with failures while upgrading Db2 servers in HADR environments (without standby reinitialization)

This section describes how to deal with failures while upgrading Db2 servers in HADR environments (without standby reinitialization).

If using the procedure Upgrading Db2 servers in HADR environments (without standby reinitialization), this procedure maintains the database role and relies on normal log shipping and log replaying characteristics common to HADR functionality. The procedure avoids the need to stop HADR for upgrade and avoids the need to reinitialize HADR. This reduces the window where no standby database exists and eliminates the cost of sending a backup image to the standby site for reinitialization.

During upgrade a failure can occur at any point in the procedure with any component that makes up the primary or standby. An upgrade is a scheduled event, so any failure is considered severe and having the primary or standby database available as quickly as possible is paramount. In most cases, a failure results in either the primary or standby database no longer being available to continue it's role in the HADR upgrade procedure. When this happens, the failing database must be taken out of it's role by stopping HADR, continuing upgrade as a non-HADR database, and then reinitializing HADR post upgrade.

It is difficult to document every possible failure scenario, but this topic attempts to walk you through what actions can be taken for failures at certain common points in the procedure.

Scenario 1: In Db2 version 10.5 Fix Pack 7 or later, if the primary's log shipping functionality and the standby's log replay functionality are not healthy causing db2iupgrade/db2ckupgrade to fail.

If the issue cannot be fixed within the upgrade window, then follow the previous HADR procedure that requires the stopping of HADR and reinitialization discussed in Upgrading Db2 servers in HADR environments.

Scenario 2: In Db2 version 10.5 Fix Pack 7 or later, if the primary's log shipping functionality and the standby's log replay functionality are healthy but the standby's replay position is still behind the primary's log shipping position causing db2iupgrade/db2ckupgrade to fail.

Ensure that replay delay is turned off by setting the hadr_replay_delay to 0. Try to allow more wait time for the standby to catch up, the default waiting time is at least 120 seconds. Increase the hadr_timeout value to allow for longer waiting time. If neither of these options allow for the log positions to match within the upgrade window, then follow the previous HADR procedure that requires the stopping of HADR and reinitialization discussed in Upgrading Db2 servers in HADR environments.

Scenario 3: In Db2 version 10.5 Fix Pack 7 or later, if the primary database becomes unavailable preventing db2iupgrade/db2ckupgrade from being run.

If the primary database cannot be brought back up within the upgrade window, switch roles on the standby and then follow the previous HADR procedure that requires the stopping of HADR and reinitialization discussed in Upgrading Db2 servers in HADR environments.

Scenario 4: In Db2 version 10.5 Fix Pack 7 or later, if the standby database becomes unavailable preventing db2iupgrade/db2ckupgrade from being run.

If the standby database cannot be brought back up within the upgrade window, then follow the previous HADR procedure that requires the stopping of HADR and reinitialization discussed in Upgrading Db2 servers in HADR environments.

Scenario 5: In Db2 version 11.1, if the primary database becomes unavailable preventing the upgrade procedure from continuing on the standby.

If the primary database cannot be brought back up within the upgrade window, on the standby issue STOP HADR followed by ROLLFORWARD DATABASE with the STOP option. This will turn the database into a non-HADR database. The database will now be upgrade pending and so issue the UPGRADE DATABASE command to continue the upgrade. Once complete refer to Post-upgrade tasks for Db2 servers and Verifying upgrade of Db2 servers. HADR must be reinitialized.

Scenario 6: In Db2 version 11.1, if the standby database becomes unavailable preventing the UPGRADE DATABASE command from starting up on the primary.

If the standby database cannot be brought back up within the upgrade window, on the primary issue STOP HADR. This turns the database into a non-HADR database. The database will still be upgrade pending so reissue the UPGRADE DATABASE command to continue the upgrade. Once complete refer to Post-upgrade tasks for Db2 servers and Verifying upgrade of Db2 servers. HADR will have to be reinitialized.

Scenario 7: In Db2 version 11.1, if the standby database becomes unavailable while in upgrade in progress state.

Once the UPGRADE DATABASE command is issued on the primary and the primary forms a connection with a standby database, the upgrade will proceed without issue on the primary and will eventually complete successfully. The concern is that there is no standby database replaying log data, which leaves an exposure to a loss of the primary. Post upgrade the primary database can still be brought up through the START HADR command specifying the BY FORCE option. At this point, all attempts should be made to resolve the issues with the standby. Once resolved, since the standby was in upgrade in progress state, the UPGRADE DATABASE command should be issued. The standby continues to replay the upgrade log data shipped by the primary until it completes and is no longer in the upgrade in progress state.

Scenario 8: In Db2 version 11.1, if the UPGRADE DATABASE command with the REBINDALL option was specified on the primary and the standby database becomes unavailable while in upgrade in progress state.

The difference from Scenario 7 is that on the primary the UPGRADE DATABASE command was specified with the REBINDALL option. In this case, the UPGRADE DATABASE command requires and attempts a new connection to the database. If the standby database is not available during this second connection attempt, the UPGRADE DATABASE command returns SQL1499W. SQL1499W can be returned for many other reasons so the Db2 diagnostics log may be required to tell what failed and whether this scenario applies. If so, the primary database can still be brought up through the START HADR command specifying the BY FORCE option. Rebinding can still take place manually at this point. But, all attempts should be made to resolve the issues with the standby. Once resolved, since the standby was in upgrade in progress state, the UPGRADE DATABASE command should be issued. The standby continues to replay the upgrade log data shipped by the primary until it completes and is no longer in the upgrade in progress state.

At any time, if there are issues with the upgrade to Db2 version 11.1, you can reverse the upgrade or fall back from Db2 version 11.1 to a pre-Db2 version 11.1 release. See Reversing Db2 server upgrade to learn all the required steps to reverse a database upgrade.