Troubleshooting
Problem
Spectrum Protect Server startup can hang because DB2 initialization never completes
Symptom
Starting server process dsmserv in the foreground will look similar to the following when it hangs:
ANR0990I Server restart-recovery in progress
Nothing happens after the above line is shown in dsmserv output.
In the db2diag.log you may see the following messages repeat several times:
2018-09-04-14.21.29.842199-240 I2000A787 LEVEL: Error
PID : 9568712 TID : 1 PROC : db2fm
INSTANCE: tsminst1 NODE : 000
HOSTNAME: NIMTSM
EDUID : 1
FUNCTION: DB2 Common, Generic Control Facility, GcfCaller::getState, probe:40
MESSAGE : ECF=0x9000028C=-1879047540=ECF_GCF_GCF_FUNCTION_TIMED_OUT
Timeout occured while calling a GCF interface function
CALLSTCK: (Static functions may not be resolved correctly, as they are resolved to the nearest symbol)
[0] 0x09000000012ACFFC oss_log__FP9OSSLogFacUiN32UlN26iPPc + 0x1BC
[1] 0x09000000012ACDC4 ossLog + 0x64
[2] 0x0900000001AB8024 getState__9GcfCallerFP12GCF_PartInfoUlP11GCF_RetInfo + 0xA4
[3] 0x00000001000038A8 main + 0x2328
[4] 0x00000001000002B0 __start + 0x70
2018-09-04-14.21.29.843076-240 I2788A830 LEVEL: Error
PID : 9568712 TID : 1 PROC : db2fm
INSTANCE: tsminst1 NODE : 000
HOSTNAME: NIMTSM
EDUID : 1
FUNCTION: DB2 Common, Fault Monitor Facility, db2fm, probe:180
MESSAGE : ECF=0x9000034B=-1879047349=ECF_FM_FAIL_TO_GETSTATE_GCF_FM
Failed to get the state of the GCF fm module
CALLED : DB2 Common, Generic Control Facility, GcfCaller::getState
DATA #1 : signed integer, 8 bytes
2415919756
DATA #2 : unsigned integer, 8 bytes
0
CALLSTCK: (Static functions may not be resolved correctly, as they are resolved to the nearest symbol)
[0] 0x09000000012ACFFC oss_log__FP9OSSLogFacUiN32UlN26iPPc + 0x1BC
[1] 0x09000000012AD3AC ossLogRC + 0x6C
[2] 0x0000000100003A98 main + 0x2518
[3] 0x00000001000002B0 __start + 0x70
The output from the 'ps -ef | grep -i db2' command may show the following DB2 processes running. Notice that db2sys is not running:
.
Cause
Issue with db2 configuration or network is preventing db2 processes from successfully completing the GETHOSTBYNAME function call
Diagnosing The Problem
Collecting procstack (pstack for linux) of the running db2 processes will show that the process hang is related to a network problem or configuration. The following is output from "procstack db2start" process id:
0x09000000001651f4 __fd_poll(??, ??, ??) + 0xb4
0x0900000000130bec res_nsend(0x3ea, 0xfffffffffff9790, 0x1800000018, 0x0, 0x3ea) + 0x1aec
0x090000000015549c res_nquery(??, ??, ??, ??, ??, ??) + 0xbc
0x09000000001549a8 res_nquerydomain(??, ??, ??, ??, ??, ??, ??) + 0x128
0x0900000000154dcc res_nsearch(??, ??, ??, ??, ??, ??) + 0x28c
0x090000000013dd24 res_search(??, ??, ??, ??, ??) + 0x124
0x0900000000161a24 ho_byname2(??, ??, ??) + 0xa4
0x09000000001877e4 ho_byname2(??, ??, ??) + 0x404
0x090000000012d48c gethostbyname2(??, ??) + 0x16c
0x0900000000132a48 IPRA.$getaddrinfo2(??, ??, ??, ??, ??) + 0x928
0x0900000000134128 getaddrinfo(??, ??, ??, ??) + 0x408
0x090000000531f990 sqloPdbTcpIpGetAddrInfo(0x0, 0x0, 0x9000000066db9d8, 0x0, 0x5000000000005) + 0x90
0x09000000053155e0 sqloPdbInitNodeAddrHndl(0x100000001, 0x2000, 0x0, 0x0) + 0x460
0x090000000544eeb4 sqloReadDb2nodesWithHandleInternal(??, ??, ??, ??, ??, ??, ??, ??) + 0x2174
0x090000000544ccb8 sqloReadDb2nodesInternal(??, ??, ??, ??, ??, ??, ??, ??) + 0x98
0x09000000053e181c sqlfReadDb2nodes(??, ??, ??, ??, ??, ??, ??, ??) + 0xbc
0x09000000054480f0 sqleInitApplicationEnvironment(int,unsigned int,unsigned int,sqlca*)(??, ??, ??, ??) + 0x1430
0x09000000054469bc sqleCommonInitializationForAPIs(??) + 0x5c
0x000000010000210c main(??, ??) + 0x1cc
0x00000001000002f8 __start() + 0x70
The above output clearly shows that DB2START process is stuck in gethostbyname function. This means that there may be a network issue, for example if DNS is not working properly.
Resolving The Problem
One possible reason is related to a db2 configuration file called db2nodes.cfg. This file is located
in the db2 instance home directory/sqllib folder. The file may look similar to the following:
dbpartitionnum hostname logicalport netname resourcesetname
NOTE: You probably will not see dbpartitionnum in Spectrum Protect environment.
Here is link from DB2 user guide about this configuration file.
https://www.ibm.com/support/knowledgecenter/en/SSEPGG_10.5.0/com.ibm.db2.luw.qb.server.doc/doc/r0006351.html
You must be able to ping the HOSTNAME seen in this file. Something in the network has changed and is causing db2start process to hang and prevent the server from starting. Please work with your network team to resolve the network problems that is causing ping of HOSTNAME in the db2nodes.cfg file to fail. Once ping of HOSTNAME is successful, DB2start will complete successfully and allow dsmserv to start successfully.
Historical Number
clg2018 #17 , TS001327334
Product Synonym
tsm db2
Was this topic helpful?
Document Information
Modified date:
02 January 2019
UID
ibm10732954