IBM Support

IV95313: A MEMBER GOES DOWN WITH FODC_PANIC DUMPS WHEN EDUS RUSH TO RETRIEVE THE CONNECTIONS

Subscribe

You can track all active APARs for this component.

 

APAR status

  • Closed as program error.

Error description

  • In a certain situation like a node failure recovery, the
    Connections Pool Manager happens to be getting locked out by
    the edu's trying to retrieve connections from the connection
    pool.  This results in the FODC panic condition that causes the
    member to go down on pureScale.
    
      In this case, the marking the links offline message such as
    the following should be seen in db2diag.log.
    
    2017-03-22-16.28.28.638890+540 I87508473E1910        LEVEL:
    Event
    PID     : 26942                TID : 139602274805504 PROC :
    db2sysc 1
    INSTANCE: db2inst1             NODE : 001            DB   :
    DBNAME
    APPHDL  : 1-4595               APPID:
    192.168.10.5.45447.170322080947
    AUTHID  : DB2INST1             HOSTNAME: hostname 1
    EDUID   : 285                  EDUNAME: db2agent (DBNAME) 1
    FUNCTION: DB2 UDB, Shared Data Structure Abstraction Layer for
    CF, SQLE_SINGLE_CA_HANDLE::sqleCaCeMarkAdapterOffline,
    probe:1857
    MESSAGE : Adapter Offline Request:
    SQLE_SINGLE_CA_HANDLE::sqleSingleCaCreateNewConnectionsForPool()
    :2710 has changed the LOCAL state of link #2 (MEMBER0001's
    ofa-v2-ens1,pscf0d2) in memory from [R:ONLINE,L:ONLINE] to
    [R:ONLINE,L:OFFLINE]
    DATA #1 : Codepath, 8 bytes
    5:20
    DATA #2 : Connection pool link adapter number,
    PD_TYPE_SAL_ADAPTER_NUMBER, 8 bytes
    2
    DATA #3 : PsToken_t, PD_TYPE_SD_PSTOKEN, 152 bytes
    Eye Catcher               = CATOKEN
    CF Server Info :
    - Unique Sequence Number = 647 (0x287)
    - Port Number            = 56001
    - Node Identifier        = 2
    - Instance Identifier    = 0
    - Netname                = pscf0d2
    Local Member Info :
    - Device Name            = ofa-v2-ens1
    Transport Type            = UDAPL (0x1)
    Cmd Connection Use Types  = NORMAL (0x0)
    DATA #4 : unsigned integer, 8 bytes
    4
    DATA #5 : unsigned integer, 8 bytes
    
      Then the db2CFConnPoolMgr EDU is getting hang even though it
    should be active and growing the connection pool.
    
    2017-03-22-16.28.32.978047+540 I88006320E996         LEVEL:
    Severe
    PID     : 26942                TID : 139687582754560 PROC :
    db2sysc 1
    INSTANCE: db2inst1             NODE : 001
    HOSTNAME: hostname 1
    EDUID   : 33                   EDUNAME: db2CFConnPoolMgr 1
    FUNCTION: DB2 UDB, Shared Data Structure Abstraction Layer for
    CF, SQLE_CA_CONN_ENTRY_DATA::sqleCaCeConnect, probe:790
    MESSAGE : CA RC= 2148073473
    DATA #1 : String, 17 bytes
    PsConnect failed.
    DATA #2 : Connection pool link adapter number,
    PD_TYPE_SAL_ADAPTER_NUMBER, 8 bytes
    3
    DATA #3 : PsToken_t, PD_TYPE_SD_PSTOKEN, 152 bytes
    Eye Catcher               = CATOKEN
    CF Server Info :
    - Unique Sequence Number = 647 (0x287)
    - Port Number            = 56001
    - Node Identifier        = 2
    - Instance Identifier    = 0
    - Netname                = pscf0d2
    Local Member Info :
    - Device Name            = ofa-v2-ens1d1
    Transport Type            = UDAPL (0x1)
    Cmd Connection Use Types  = NORMAL (0x0)
    DATA #4 : unsigned integer, 8 bytes
    10
    
      The next message from db2CFConnPoolMgr is logged 4 minutes
    later like below.
    
    2017-03-22-16.32.29.415138+540 I128230221E1490       LEVEL:
    Severe
    PID     : 26942                TID : 139687582754560 PROC :
    db2sysc 1
    INSTANCE: db2inst1             NODE : 001
    HOSTNAME: hostname 1
    EDUID   : 33                   EDUNAME: db2CFConnPoolMgr 1
    FUNCTION: DB2 UDB, Shared Data Structure Abstraction Layer for
    CF,
    SQLE_SINGLE_CA_HANDLE::sqleSingleCaCreateNewConnectionsForPool,
    probe:2565
    MESSAGE : ZRC=0x87270023=-2027487197=SQLE_SAL_UNEXPECTED_ERROR
              "Unexpected SAL Error."
    DATA #1 : String, 76 bytes
    Error when trying to create CF connections. m_whichCa,
    numConnections, flags
    DATA #2 : SAL CF Index, PD_TYPE_SAL_CF_INDEX, 8 bytes
    1
    DATA #3 : SAL CF Node Number, PD_TYPE_SAL_CF_NODE_NUM, 2 bytes
    128
    DATA #4 : unsigned integer, 8 bytes
    1
    DATA #5 : Bitmask, 8 bytes
    0x0000000000000000
    DATA #6 : Codepath, 8 bytes
    6:10:19:22:28:53
    CALLSTCK: (Static functions may not be resolved correctly, as
    they are resolved to the nearest symbol)
      [0] 0x00007F0BA2164124
    _ZN21SQLE_SINGLE_CA_HANDLE39sqleSingleCaCreateNewConnectionsForP
    oolEmR12sqzDataChainI18SQLE_CA_CONN_ENTRY16sqzChainNodeBaseIS1_
    + 0x42E8
      [1] 0x00007F0BA2167357
    _ZN21SQLE_SINGLE_CA_HANDLE20sqleSingleCaGrowPoolEmm17SAL_ADAPTER
    _INDEX + 0x84F
      [2] 0x00007F0BA215FB1D SAL_DoAdapterHousekeeping + 0x587
      [3] 0x00007F0BA21FDBE1 _Z22sqleCFConnPoolMgrEntryPhj + 0x2F1
      [4] 0x00007F0BA3CB4D58 sqloEDUEntry + 0x578
      [5] 0x00007F0BAADF9DC5 /lib64/libpthread.so.0 + 0x7DC5
      [6] 0x00007F0B9B9F31CD clone + 0x6D
    
      During the db2CFConnPoolMgr hang, many messages of db2agents
    trying to create connections in the pool but failing to succeed
    can be seen like below.
    
    2017-03-22-16.33.37.237165+540 I131586765E4602       LEVEL:
    Warning
    PID     : 26942                TID : 139593118639872 PROC :
    db2sysc 1
    INSTANCE: db2inst1             NODE : 001            DB   :
    DBNAME
    APPHDL  : 1-4544               APPID:
    192.168.10.5.45399.170322080834
    AUTHID  : DB2INST1             HOSTNAME: hostname 1
    EDUID   : 855                  EDUNAME: db2agent (DBNAME) 1
    FUNCTION: DB2 UDB, Shared Data Structure Abstraction Layer for
    CF,
    SQLE_SINGLE_CA_HANDLE::sqleSingleCaCreateNewConnectionsForPool,
    probe:2685
    MESSAGE : Failed to connect to CF using Maximum timeout,
    Marking link as
              offline.
    DATA #1 : unsigned integer, 8 bytes
    10
    DATA #2 : SQLE_CA_ADAPTER_STATE, PD_TYPE_SAL_ADAPTER_STATE, 304
    bytes
    AdapterState::szCFNetname = pscf0d2
    AdapterState::szMemberDeviceName = ofa-v2-ens1d1
    AdapterState::m_numConnectionsPerAdapter = 149
    AdapterState::m_connectTimeoutForLink = 10
    AdapterState::bLinkIsOnlineRsct: true
    AdapterState::bLinkIsOnlineLocal: false
    DATA #3 : PsToken_t, PD_TYPE_SD_PSTOKEN, 152 bytes
    Eye Catcher               = CATOKEN
    CF Server Info :
    - Unique Sequence Number = 647 (0x287)
    - Port Number            = 56001
    - Node Identifier        = 2
    - Instance Identifier    = 0
    - Netname                = pscf0d2
    Local Member Info :
    - Device Name            = ofa-v2-ens1d1
    Transport Type            = UDAPL (0x1)
    Cmd Connection Use Types  = NORMAL (0x0)
    CALLSTCK: (Static functions may not be resolved correctly, as
    they are resolved to the nearest symbol)
      [0] 0x00007F0BA2160D93
    _ZN21SQLE_SINGLE_CA_HANDLE39sqleSingleCaCreateNewConnectionsForP
    oolEmR12sqzDataChainI18SQLE_CA_CONN_ENTRY16sqzChainNodeBaseIS1_
    + 0xF57
      [1] 0x00007F0BA2167357
    _ZN21SQLE_SINGLE_CA_HANDLE20sqleSingleCaGrowPoolEmm17SAL_ADAPTER
    _INDEX + 0x84F
      [2] 0x00007F0BA2174455
    _ZN21SQLE_SINGLE_CA_HANDLE27sqleSingleCaSearchFreelistsER21FREEL
    IST_SEARCH_STATSRP18SQLE_CA_CONN_ENTRYRP29SQLE_CACP_LATCH_AND_F
    + 0x819
      [3] 0x00007F0BA2169C0C
    _ZN21SQLE_SINGLE_CA_HANDLE25sqleSingleCaGetConnectionEPP18SQLE_C
    A_CONN_ENTRYP10SAL_CA_KEYmm17SAL_ADAPTER_INDEXjm + 0x1E0
      [4] 0x00007F0BA20AA1B8
    _ZN17SAL_CA_CONNECTION17SAL_GetConnectionERKjP10SAL_CA_KEYmm17SA
    L_ADAPTER_INDEX + 0x3D8
      [5] 0x00007F0BA208362D
    _ZN14SAL_GLM_HANDLE17SAL_SetLockStateNEP20SAL_SetLockStateInfojm
    PjtjmP17SAL_CA_CONNECTIONb + 0xA6D
      [6] 0x00007F0BA2080293
    _ZN14SAL_GLM_HANDLE16SAL_SetLockStateEP20SAL_SetLockStateInfomPj
    tjmP17SAL_CA_CONNECTIONb + 0x63
      [7] 0x00007F0BA409A067
    _Z19sqlpLLMSetLockStateP9sqeBsuEduP18SAL_LOCK_STRUCTUREP20SAL_Se
    tLockStateInfoPbt + 0xC7
      [8] 0x00007F0BA4096707
    _Z30sqlpLLMInformGLMIfStateChangedP9sqeBsuEduRP8SQLP_LRBS2_P14SQ
    LP_LOCK_INFOP17SQLP_LLM_SSM_INFO + 0xA57
      [9] 0x00007F0BA40BDE49
    _Z21sqlplMakeNewRequestSDP9sqeBsuEduP14SQLP_LOCK_INFOP11SQLP_TEN
    TRYRP8SQLP_LRBS6_P15SQLP_LTRN_CHAINbbbbb + 0x1009
      [10] 0x00007F0BA3F374A2
    _Z7sqlplrqP9sqeBsuEduP14SQLP_LOCK_INFO + 0xE92
      [11] 0x00007F0BA5625B41
    /home/db2inst1/sqllib/lib64/libdb2e.so.1 + 0x86FBB41
    ...
    

Local fix

  • Set the following DB2 registry variables and restart DB2.
        db2set DB2_SAL_INITIAL_TIMEOUT_FOR_CONNECT_SEC=10
        db2set DB2_SAL_CONNECT_MAX_TIMEOUT=20
    

Problem summary

  • ****************************************************************
    * USERS AFFECTED:                                              *
    * pureScale                                                    *
    ****************************************************************
    * PROBLEM DESCRIPTION:                                         *
    * See Error Description                                        *
    ****************************************************************
    * RECOMMENDATION:                                              *
    * Upgrade to DB2 Version 11.1 Modification 2 FixPack 2 or      *
    * later                                                        *
    ****************************************************************
    

Problem conclusion

  • Fixed in DB2 Version 11.1 Modification 2 FixPack 2
    

Temporary fix

Comments

APAR Information

  • APAR number

    IV95313

  • Reported component name

    DB2 PURESCALE F

  • Reported component ID

    5724Y6900

  • Reported release

    B10

  • Status

    CLOSED PER

  • PE

    NoPE

  • HIPER

    NoHIPER

  • Special Attention

    NoSpecatt / Xsystem

  • Submitted date

    2017-04-20

  • Closed date

    2017-06-27

  • Last modified date

    2017-06-27

  • APAR is sysrouted FROM one or more of the following:

  • APAR is sysrouted TO one or more of the following:

Fix information

  • Fixed component name

    DB2 PURESCALE F

  • Fixed component ID

    5724Y6900

Applicable component levels

  • RB10 PSN

       UP

[{"Business Unit":{"code":"BU053","label":"Cloud & Data Platform"},"Product":{"code":"SSEPGG","label":"Db2 for Linux, UNIX and Windows"},"Platform":[{"code":"PF025","label":"Platform Independent"}],"Version":"11.1","Line of Business":{"code":"LOB10","label":"Data and AI"}}]

Document Information

Modified date:
01 August 2020