IBM Support

Spectrum Protect Server startup hangs during DB2 initialization

Troubleshooting


Problem

Spectrum Protect Server startup can hang because DB2 initialization never completes

Symptom

Starting server process dsmserv in the foreground will look similar to the following when it hangs:

ANR0990I Server restart-recovery in progress

Nothing happens after the above line is shown in dsmserv output.

In the db2diag.log you may see the following messages repeat several times:

2018-09-04-14.21.29.842199-240 I2000A787           LEVEL: Error
PID    : 9568712             TID : 1             PROC : db2fm
INSTANCE: tsminst1            NODE : 000
HOSTNAME: NIMTSM
EDUID  : 1
FUNCTION: DB2 Common, Generic Control Facility, GcfCaller::getState, probe:40
MESSAGE : ECF=0x9000028C=-1879047540=ECF_GCF_GCF_FUNCTION_TIMED_OUT
                    Timeout occured while calling a GCF interface function
CALLSTCK: (Static functions may not be resolved correctly, as they are resolved to the nearest symbol)
 [0] 0x09000000012ACFFC oss_log__FP9OSSLogFacUiN32UlN26iPPc + 0x1BC
 [1] 0x09000000012ACDC4 ossLog + 0x64
 [2] 0x0900000001AB8024 getState__9GcfCallerFP12GCF_PartInfoUlP11GCF_RetInfo + 0xA4
 [3] 0x00000001000038A8 main + 0x2328
 [4] 0x00000001000002B0 __start + 0x70
 
 
2018-09-04-14.21.29.843076-240 I2788A830           LEVEL: Error
PID    : 9568712             TID : 1             PROC : db2fm
INSTANCE: tsminst1            NODE : 000
HOSTNAME: NIMTSM
EDUID  : 1
FUNCTION: DB2 Common, Fault Monitor Facility, db2fm, probe:180
MESSAGE : ECF=0x9000034B=-1879047349=ECF_FM_FAIL_TO_GETSTATE_GCF_FM
         Failed to get the state of the GCF fm module
CALLED : DB2 Common, Generic Control Facility, GcfCaller::getState
DATA #1 : signed integer, 8 bytes
2415919756
DATA #2 : unsigned integer, 8 bytes
0
CALLSTCK: (Static functions may not be resolved correctly, as they are resolved to the nearest symbol)
 [0] 0x09000000012ACFFC oss_log__FP9OSSLogFacUiN32UlN26iPPc + 0x1BC
 [1] 0x09000000012AD3AC ossLogRC + 0x6C
 [2] 0x0000000100003A98 main + 0x2518
 [3] 0x00000001000002B0 __start + 0x70
 

The output from the 'ps -ef | grep -i db2' command may show the following DB2 processes running. Notice that db2sys is not running:

.

ps -ef | grep db2
tsminst1  4653344 11731414   0 09:40:43  pts/0  0:00 db2start
tsminst1  6095346  7274994   0 09:39:11      -  0:00 /home/tsminst1/sqllib/bin/db2fm -i tsminst1 -m /opt/tivoli/tsm/db2/lib64/libdb2gcf.a -S
    root  7274994        1   0 07:40:14      -  0:00 /opt/tivoli/tsm/db2/bin/db2fmcd
    root  8061280 10944794   0 09:42:41  pts/1  0:00 grep db2
tsminst1 11403750        1   0 09:38:41      -  0:00 /home/tsminst1/sqllib/adm/db2set -i tsminst1 DB2AUTOSTART
 

Cause

Issue with db2 configuration or network is preventing db2 processes from successfully completing the GETHOSTBYNAME function call

Diagnosing The Problem

Collecting procstack (pstack for linux) of the running db2 processes will show that the process hang is related to a network problem or configuration. The following is output from "procstack db2start" process id:

0x09000000001651f4 __fd_poll(??, ??, ??) + 0xb4
0x0900000000130bec res_nsend(0x3ea, 0xfffffffffff9790, 0x1800000018, 0x0, 0x3ea) + 0x1aec
0x090000000015549c res_nquery(??, ??, ??, ??, ??, ??) + 0xbc
0x09000000001549a8 res_nquerydomain(??, ??, ??, ??, ??, ??, ??) + 0x128
0x0900000000154dcc res_nsearch(??, ??, ??, ??, ??, ??) + 0x28c
0x090000000013dd24 res_search(??, ??, ??, ??, ??) + 0x124
0x0900000000161a24 ho_byname2(??, ??, ??) + 0xa4
0x09000000001877e4 ho_byname2(??, ??, ??) + 0x404
0x090000000012d48c gethostbyname2(??, ??) + 0x16c
0x0900000000132a48 IPRA.$getaddrinfo2(??, ??, ??, ??, ??) + 0x928
0x0900000000134128 getaddrinfo(??, ??, ??, ??) + 0x408
0x090000000531f990 sqloPdbTcpIpGetAddrInfo(0x0, 0x0, 0x9000000066db9d8, 0x0, 0x5000000000005) + 0x90
0x09000000053155e0 sqloPdbInitNodeAddrHndl(0x100000001, 0x2000, 0x0, 0x0) + 0x460
0x090000000544eeb4 sqloReadDb2nodesWithHandleInternal(??, ??, ??, ??, ??, ??, ??, ??) + 0x2174
0x090000000544ccb8 sqloReadDb2nodesInternal(??, ??, ??, ??, ??, ??, ??, ??) + 0x98
0x09000000053e181c sqlfReadDb2nodes(??, ??, ??, ??, ??, ??, ??, ??) + 0xbc
0x09000000054480f0 sqleInitApplicationEnvironment(int,unsigned int,unsigned int,sqlca*)(??, ??, ??, ??) + 0x1430
0x09000000054469bc sqleCommonInitializationForAPIs(??) + 0x5c
0x000000010000210c main(??, ??) + 0x1cc
0x00000001000002f8 __start() + 0x70

The above output clearly shows that DB2START process is stuck in gethostbyname function. This means that there may be a network issue, for example if DNS is not working properly.

Resolving The Problem

One possible reason is related to a db2 configuration file called db2nodes.cfg. This file is located
in the db2 instance home directory/sqllib folder. The file may look similar to the following:

dbpartitionnum hostname logicalport netname resourcesetname

NOTE: You probably will not see dbpartitionnum in Spectrum Protect environment.

Here is link from DB2 user guide about this configuration file.

https://www.ibm.com/support/knowledgecenter/en/SSEPGG_10.5.0/com.ibm.db2.luw.qb.server.doc/doc/r0006351.html

You must be able to ping the HOSTNAME seen in this file. Something in the network has changed and is causing db2start process to hang and prevent the server from starting. Please work with your network team to resolve the network problems that is causing ping of HOSTNAME in the db2nodes.cfg file to fail. Once ping of HOSTNAME is successful,  DB2start will complete successfully and allow dsmserv to start successfully.

 

[{"Business Unit":{"code":"BU058","label":"IBM Infrastructure w\/TPS"},"Product":{"code":"SSEQVQ","label":"IBM Spectrum Protect"},"Component":"Server","Platform":[{"code":"PF002","label":"AIX"},{"code":"PF016","label":"Linux"}],"Version":"All versions","Edition":"","Line of Business":{"code":"LOB26","label":"Storage"}}]

Historical Number

clg2018 #17 , TS001327334

Product Synonym

tsm db2

Document Information

Modified date:
02 January 2019

UID

ibm10732954