IBM Support

What's the condition of SQL1768N rc=7 HADR start failure? And what to do?

Technical Blog Post


Abstract

What's the condition of SQL1768N rc=7 HADR start failure? And what to do?

Body

The answers are very simple as the reason code "7" explanation from 'db2 "? SQL1768N' is quite straightforward.
You are trying to activate a database as HADR primary but it can't find the available standby to make PEER status with.

So why am I writing this blog for this simple query ?
In real world, I have seen many DB2 mates still get panic and take time about 30 minutes to an hour to understand why this happens.
This time slot may be too long in case of an urgent situation to bring up the database.
And they have wanted to get overall step outlook to add to their operation manual.

For example, let me share one possible  scenario.
HADR host1(primary) and host2(standby) without TSA (Tivoli system automation) cluster.

- host1 suddenly shutdown and can't recover back in short period of time.
- DBA(Database administrator) decided to do force hadr takeover and ran the command on 'host2'.
  db2 takeover hadr on db <dbname> by force
 ( NOTE : This will not work when TSA is enabled to the HADR. SQL1770N, rc 10. But that's not the topic in this blog.)

- From the command line, Takeover seems to be completed and HADR role is changed to 'Primary' on 'host2'
  But connecting to database fails with some issue.
  Let's say DBHEAP is too small to activate database.
  After changing DBHEAP as AUTOMATIC, DBA tries deactivate and activate DB.
  The activate command gets SQL1768N rc 7 now.
- DBA keeps trying 'db2 activate db <dbname>' and 'db2 start hadr on db <dbname> as primary' hoping it works with fingers crossed !
- 20 minutes passed. After checking reason code, he feels much more gravity as there is no standby system to bring right now
  and seeking how to start active database in whatever way.

This is basic behavior looking for HADR standby when activating a database which has HADR Primary role.
If you recall the moment after initial HADR setup, we start standby then primary by order.

Now what to do next with no standby.
There are two easy ways.
1. Make the current database as 'Standard' standalone. Then activate database.
   'stop hadr' command will change the HADR role database configuration.

 $ db2 get db cfg for sample |grep -i "HADR database role"
 HADR database role                                      = PRIMARY

 $ db2 activate db sample
SQL1768N  Unable to start HADR. Reason code = "7".

$ db2 stop hadr on db sample
DB20000I  The STOP HADR ON DATABASE command completed successfully.


 $ db2 get db cfg for sample |grep -i "HADR database role"
 HADR database role                                      = STANDARD

 $ db2 activate db sample
DB20000I  The ACTIVATE DATABASE command completed successfully.


2. Or use 'by force' option with 'start hadr'.
 $ db2 start hadr on db sample as primary by force

In operation perspective, what to do next when original primary system comes back.
One thing we need to bear in mind is HADR role is still set as 'Primary' on 'host1'.
Therefore you should not let this database be activated as it is.
Otherwise you may need to reconfigure HADR from the scratch which can be time consuming depending on DB size.

After rebooting 'host1', you will see the HADR role is primary as that system was shutdown in such state.

$  db2start
05/13/2017 02:49:38     0   0   SQL1063N  DB2START processing was successful.
SQL1063N  DB2START processing was successful.
$ db2 get db cfg for sample |grep -i "HADR database role"
 HADR database role                                      = PRIMARY

Now we start this as standby on 'host1'.

$ db2 start hadr on db sample as standby
DB20000I  The START HADR ON DATABASE command completed successfully.
$ db2 get db cfg for sample |grep -i "HADR database role"
 HADR database role                                      = STANDBY


After confirming HADR gets 'PEER' status from 'db2pd -hadr -db <dbname>',
we are safe to switch the primary to original 'host1'.

(host1)
$ db2pd -db sample -hadr |grep HADR_STATE
                           HADR_STATE = PEER

$ db2 takeover hadr on db sample
DB20000I  The TAKEOVER HADR ON DATABASE command completed successfully
.

Hope this helps to save your time in the similar situation.

[{"Business Unit":{"code":"BU058","label":"IBM Infrastructure w\/TPS"},"Product":{"code":"SSEPGG","label":"Db2 for Linux, UNIX and Windows"},"Component":"","Platform":[{"code":"PF025","label":"Platform Independent"}],"Version":"","Edition":"","Line of Business":{"code":"LOB10","label":"Data and AI"}}]

UID

ibm11139992