Fixes are available
DB2 Version 11.1 Mod 2 Fix Pack 2 for Linux, UNIX, and Windows
Db2 Version 11.1 Mod2 Fix Pack2 iFix001 for Linux, UNIX, and Windows
Db2 Version 11.1 Mod2 Fix Pack2 iFix002 for Linux, UNIX, and Windows
Db2 Version 11.1 Mod 3 Fix Pack 3 for Linux, UNIX, and Windows
Db2 Version 11.1 Mod3 Fix Pack3 iFix001 for Linux, UNIX, and Windows
Db2 Version 11.1 Mod3 Fix Pack3 iFix002 for Linux, UNIX, and Windows
Db2 Version 11.1 Mod4 Fix Pack4 iFix001 for Linux, UNIX, and Windows
Db2 Version 11.1 Mod 4 Fix Pack 4 for Linux, UNIX, and Windows
Db2 Version 11.1 Mod 4 Fix Pack 6 for Linux, UNIX, and Windows
Db2 Version 11.1 Mod 4 Fix Pack 5 for Linux, UNIX, and Windows
Db2 Version 11.1 Mod 4 Fix Pack 7 for Linux, UNIX, and Windows
APAR status
Closed as program error.
Error description
In a certain situation like a node failure recovery, the Connections Pool Manager happens to be getting locked out by the edu's trying to retrieve connections from the connection pool. This results in the FODC panic condition that causes the member to go down on pureScale. In this case, the marking the links offline message such as the following should be seen in db2diag.log. 2017-03-22-16.28.28.638890+540 I87508473E1910 LEVEL: Event PID : 26942 TID : 139602274805504 PROC : db2sysc 1 INSTANCE: db2inst1 NODE : 001 DB : DBNAME APPHDL : 1-4595 APPID:X.X.X.X.0000 AUTHID : DB2INST1 HOSTNAME: MYHOST1 EDUID : 285 EDUNAME: db2agent (DBNAME) 1 FUNCTION: DB2 UDB, Shared Data Structure Abstraction Layer for CF, SQLE_SINGLE_CA_HANDLE::sqleCaCeMarkAdapterOffline, probe:1857 MESSAGE : Adapter Offline Request: SQLE_SINGLE_CA_HANDLE::sqleSingleCaCreateNewConnectionsForPool() :2710 has changed the LOCAL state of link #2 (MEMBER0001's ofa-v2-ens1,pscf0d2) in memory from [R:ONLINE,L:ONLINE] to [R:ONLINE,L:OFFLINE] DATA #1 : Codepath, 8 bytes 5:20 DATA #2 : Connection pool link adapter number, PD_TYPE_SAL_ADAPTER_NUMBER, 8 bytes 2 DATA #3 : PsToken_t, PD_TYPE_SD_PSTOKEN, 152 bytes Eye Catcher = CATOKEN CF Server Info : - Unique Sequence Number = 647 (0x287) - Port Number = 56001 - Node Identifier = 2 - Instance Identifier = 0 - Netname = pscf0d2 Local Member Info : - Device Name = ofa-v2-ens1 Transport Type = UDAPL (0x1) Cmd Connection Use Types = NORMAL (0x0) DATA #4 : unsigned integer, 8 bytes 4 DATA #5 : unsigned integer, 8 bytes Then the db2CFConnPoolMgr EDU is getting hang even though it should be active and growing the connection pool. 2017-03-22-16.28.32.978047+540 I88006320E996 LEVEL: Severe PID : 26942 TID : 139687582754560 PROC : db2sysc 1 INSTANCE: db2inst1 NODE : 001 HOSTNAME: MYHOST1 EDUID : 33 EDUNAME: db2CFConnPoolMgr 1 FUNCTION: DB2 UDB, Shared Data Structure Abstraction Layer for CF, SQLE_CA_CONN_ENTRY_DATA::sqleCaCeConnect, probe:790 MESSAGE : CA RC= 2148073473 DATA #1 : String, 17 bytes PsConnect failed. DATA #2 : Connection pool link adapter number, PD_TYPE_SAL_ADAPTER_NUMBER, 8 bytes 3 DATA #3 : PsToken_t, PD_TYPE_SD_PSTOKEN, 152 bytes Eye Catcher = CATOKEN CF Server Info : - Unique Sequence Number = 647 (0x287) - Port Number = 56001 - Node Identifier = 2 - Instance Identifier = 0 - Netname = pscf0d2 Local Member Info : - Device Name = ofa-v2-ens1d1 Transport Type = UDAPL (0x1) Cmd Connection Use Types = NORMAL (0x0) DATA #4 : unsigned integer, 8 bytes 10 The next message from db2CFConnPoolMgr is logged 4 minutes later like below. 2017-03-22-16.32.29.415138+540 I128230221E1490 LEVEL: Severe PID : 26942 TID : 139687582754560 PROC : db2sysc 1 INSTANCE: db2inst1 NODE : 001 HOSTNAME: hostname 1 EDUID : 33 EDUNAME: db2CFConnPoolMgr 1 FUNCTION: DB2 UDB, Shared Data Structure Abstraction Layer for CF, SQLE_SINGLE_CA_HANDLE::sqleSingleCaCreateNewConnectionsForPool, probe:2565 MESSAGE : ZRC=0x87270023=-2027487197=SQLE_SAL_UNEXPECTED_ERROR "Unexpected SAL Error." DATA #1 : String, 76 bytes Error when trying to create CF connections. m_whichCa, numConnections, flags DATA #2 : SAL CF Index, PD_TYPE_SAL_CF_INDEX, 8 bytes 1 DATA #3 : SAL CF Node Number, PD_TYPE_SAL_CF_NODE_NUM, 2 bytes 128 DATA #4 : unsigned integer, 8 bytes 1 DATA #5 : Bitmask, 8 bytes 0x0000000000000000 DATA #6 : Codepath, 8 bytes 6:10:19:22:28:53 CALLSTCK: (Static functions may not be resolved correctly, as they are resolved to the nearest symbol) [0] 0x00007F0BA2164124 _ZN21SQLE_SINGLE_CA_HANDLE39sqleSingleCaCreateNewConnectionsForP oolEmR12sqzDataChainI18SQLE_CA_CONN_ENTRY16sqzChainNodeBaseIS1_ + 0x42E8 [1] 0x00007F0BA2167357 _ZN21SQLE_SINGLE_CA_HANDLE20sqleSingleCaGrowPoolEmm17SAL_ADAPTER _INDEX + 0x84F [2] 0x00007F0BA215FB1D SAL_DoAdapterHousekeeping + 0x587 [3] 0x00007F0BA21FDBE1 _Z22sqleCFConnPoolMgrEntryPhj + 0x2F1 [4] 0x00007F0BA3CB4D58 sqloEDUEntry + 0x578 [5] 0x00007F0BAADF9DC5 /lib64/libpthread.so.0 + 0x7DC5 [6] 0x00007F0B9B9F31CD clone + 0x6D During the db2CFConnPoolMgr hang, many messages of db2agents trying to create connections in the pool but failing to succeed can be seen like below. 2017-03-22-16.33.37.237165+540 I131586765E4602 LEVEL: Warning PID : 26942 TID : 139593118639872 PROC : db2sysc 1 INSTANCE: db2inst1 NODE : 001 DB : DBNAME APPHDL : 1-4544 APPID:X.X.X.X.0000 AUTHID : DB2INST1 HOSTNAME: hostname 1 EDUID : 855 EDUNAME: db2agent (DBNAME) 1 FUNCTION: DB2 UDB, Shared Data Structure Abstraction Layer for CF, SQLE_SINGLE_CA_HANDLE::sqleSingleCaCreateNewConnectionsForPool, probe:2685 MESSAGE : Failed to connect to CF using Maximum timeout, Marking link as offline. DATA #1 : unsigned integer, 8 bytes 10 DATA #2 : SQLE_CA_ADAPTER_STATE, PD_TYPE_SAL_ADAPTER_STATE, 304 bytes AdapterState::szCFNetname = pscf0d2 AdapterState::szMemberDeviceName = ofa-v2-ens1d1 AdapterState::m_numConnectionsPerAdapter = 149 AdapterState::m_connectTimeoutForLink = 10 AdapterState::bLinkIsOnlineRsct: true AdapterState::bLinkIsOnlineLocal: false DATA #3 : PsToken_t, PD_TYPE_SD_PSTOKEN, 152 bytes Eye Catcher = CATOKEN CF Server Info : - Unique Sequence Number = 647 (0x287) - Port Number = 56001 - Node Identifier = 2 - Instance Identifier = 0 - Netname = pscf0d2 Local Member Info : - Device Name = ofa-v2-ens1d1 Transport Type = UDAPL (0x1) Cmd Connection Use Types = NORMAL (0x0) CALLSTCK: (Static functions may not be resolved correctly, as they are resolved to the nearest symbol) [0] 0x00007F0BA2160D93 _ZN21SQLE_SINGLE_CA_HANDLE39sqleSingleCaCreateNewConnectionsForP oolEmR12sqzDataChainI18SQLE_CA_CONN_ENTRY16sqzChainNodeBaseIS1_ + 0xF57 [1] 0x00007F0BA2167357 _ZN21SQLE_SINGLE_CA_HANDLE20sqleSingleCaGrowPoolEmm17SAL_ADAPTER _INDEX + 0x84F [2] 0x00007F0BA2174455 _ZN21SQLE_SINGLE_CA_HANDLE27sqleSingleCaSearchFreelistsER21FREEL IST_SEARCH_STATSRP18SQLE_CA_CONN_ENTRYRP29SQLE_CACP_LATCH_AND_F + 0x819 [3] 0x00007F0BA2169C0C _ZN21SQLE_SINGLE_CA_HANDLE25sqleSingleCaGetConnectionEPP18SQLE_C A_CONN_ENTRYP10SAL_CA_KEYmm17SAL_ADAPTER_INDEXjm + 0x1E0 [4] 0x00007F0BA20AA1B8 _ZN17SAL_CA_CONNECTION17SAL_GetConnectionERKjP10SAL_CA_KEYmm17SA L_ADAPTER_INDEX + 0x3D8 [5] 0x00007F0BA208362D _ZN14SAL_GLM_HANDLE17SAL_SetLockStateNEP20SAL_SetLockStateInfojm PjtjmP17SAL_CA_CONNECTIONb + 0xA6D [6] 0x00007F0BA2080293 _ZN14SAL_GLM_HANDLE16SAL_SetLockStateEP20SAL_SetLockStateInfomPj tjmP17SAL_CA_CONNECTIONb + 0x63 [7] 0x00007F0BA409A067 _Z19sqlpLLMSetLockStateP9sqeBsuEduP18SAL_LOCK_STRUCTUREP20SAL_Se tLockStateInfoPbt + 0xC7 [8] 0x00007F0BA4096707 _Z30sqlpLLMInformGLMIfStateChangedP9sqeBsuEduRP8SQLP_LRBS2_P14SQ LP_LOCK_INFOP17SQLP_LLM_SSM_INFO + 0xA57 [9] 0x00007F0BA40BDE49 _Z21sqlplMakeNewRequestSDP9sqeBsuEduP14SQLP_LOCK_INFOP11SQLP_TEN TRYRP8SQLP_LRBS6_P15SQLP_LTRN_CHAINbbbbb + 0x1009 [10] 0x00007F0BA3F374A2 _Z7sqlplrqP9sqeBsuEduP14SQLP_LOCK_INFO + 0xE92 [11] 0x00007F0BA5625B41 /home/db2inst1/sqllib/lib64/libdb2e.so.1 + 0x86FBB41 ...
Local fix
Set the following DB2 registry variables and restart DB2. db2set DB2_SAL_INITIAL_TIMEOUT_FOR_CONNECT_SEC=10 db2set DB2_SAL_CONNECT_MAX_TIMEOUT=20
Problem summary
**************************************************************** * USERS AFFECTED: * * pureScale * **************************************************************** * PROBLEM DESCRIPTION: * * See Error Description * **************************************************************** * RECOMMENDATION: * * Upgrade to DB2 Version 11.1 Modification 2 FixPack 2 or * * later * ****************************************************************
Problem conclusion
Fixed in DB2 Version 11.1 Modification 2 FixPack 2
Temporary fix
Comments
APAR Information
APAR number
IV95313
Reported component name
DB2 PURESCALE F
Reported component ID
5724Y6900
Reported release
B10
Status
CLOSED PER
PE
NoPE
HIPER
NoHIPER
Special Attention
NoSpecatt / Xsystem
Submitted date
2017-04-20
Closed date
2017-06-27
Last modified date
2022-03-29
APAR is sysrouted FROM one or more of the following:
APAR is sysrouted TO one or more of the following:
Fix information
Fixed component name
DB2 PURESCALE F
Fixed component ID
5724Y6900
Applicable component levels
RB10 PSN
UP
Document Information
Modified date:
03 May 2022