IBM Support

IT31706: NETWORK OUTAGE WHILE DB2CFCONNPOOLMGR REBALANCES CF CONNECTION POOL MAY HANG CONNECTION POOL MANAGER.

Subscribe

You can track all active APARs for this component.

 

APAR status

  • Closed as fixed if next.

Error description

  • If a network failure occurs while db2CFConnPoolMgr edu is
    attempting to rebalance the CF connection pool, db2CFConnPoolMgr
    may be unable to reinitialize the network connections once the
    network becomes operational again. The result is that the member
    will panic (shutdown) after 6 minutes because it is unable to
    communicate with the CF.
    
    The panic will appear as the following in the db2diag.log.
    
    2020-01-15-04.06.32.716302+480 E265902469A818       LEVEL: Error
    PID     : 52690958             TID : 350754         PROC :
    db2sysc 0
    INSTANCE: Instname              NODE : 000           DB   :
    DBname
    APPHDL  : 0-7155               APPID:
    172.18.0.18.59816.191231222529
    AUTHID  : authid              HOSTNAME: Hostname1
    EDUID   : 350754               EDUNAME: db2agent (DBname) 0
    FUNCTION: DB2 UDB, base sys utilities, sqle_panic, probe:30
    MESSAGE : ADM14005E  The following error occurred: "Panic".
    First Occurrence
                    Data Capture (FODC) has been invoked in the
    following mode:
                   "Automatic".  Diagnostic information has been
    recorded in the
                   directory named
    
    "/home/Instname/sqllib/db2dump/DIAG0000/FODC_Panic_2020-01-15-04
                   .06.32.715684_52690958_350754_000/".
    
    
    After the above panic message, db2CFConnPoolMgr will dump the
    following log entries with stacks in the db2diag.log.
    
    2020-01-15-04.06.33.000012+480 I266165751A2668      LEVEL:
    Severe
    PID     : 52690958             TID : 25237          PROC :
    db2sysc 0
    INSTANCE: Instname              NODE : 000
    HOSTNAME: Hostname1
    EDUID   : 25237                EDUNAME: db2CFConnPoolMgr 0
    FUNCTION: DB2 UDB, Shared Data Structure Abstraction Layer for
    CF, SQLE_SINGLE_CA_HANDLE::sqleSingleCaSearchFreelists,
    probe:4755
    MESSAGE :
    ZRC=0x822701CE=-2111372850=SQLE_SAL_CF_COMM_RETRY_TIMEOUT
              "Attempts to communicate with CF failed."
    DATA #1 : String, 88 bytes
    The CF communication has timed out, mostly like due to a missing
    departure notification.
    DATA #2 : class sqleSalEduRetryableApiInfo, PD_TYPE_SAL_EDUINFO,
    200 bytes
    m_CurrentLevel = 1
    mData[0]:
       m_Iteration = 133708763
           m_Probe = 4703
      m_FunctionId = 423100557 => DB2 UDB, Shared Data Structure
    Abstraction Layer for CF,
    SQLE_SINGLE_CA_HANDLE::sqleSingleCaSearchFreelists
    
    m_Timer:
       Timer State: STARTED
          m_StartTime = 1579032033 ( 2020-01-15-04.00.33 )
       m_RetryTimeout = 360
              m_Level = 1
              m_Probe = 4755
         m_FunctionId = 423100557 => DB2 UDB, Shared Data Structure
    Abstraction Layer for CF,
    SQLE_SINGLE_CA_HANDLE::sqleSingleCaSearchFreelists
    Wait For Primary Diagnostics:
       - m_WaitForPrimaryFunctionId: 0 => <0>, <0>, <0>
       - m_WaitForPrimaryProbe: 0
       - m_BeginNumWaitForPrimary: 0 - 0
       - m_EndNumWaitForPrimary: 0 - 0
       - m_waitForPrimarySampledKey: 0x0
    DATA #3 : String, 25 bytes
    Calling EDU ecfid, probe:
    DATA #4 : Function, 4 bytes
    DB2 UDB, Shared Data Structure Abstraction Layer for CF,
    SQLE_SINGLE_CA_HANDLE::sqleSingleCaSearchFreelists
    DATA #5 : unsigned integer, 8 bytes
    4755
    CALLSTCK: (Static functions may not be resolved correctly, as
    they are resolved to the nearest symbol)
      [0] 0x090000000DFFC7EC
    TrackSalApiIteration__26sqleSalEduRetryableApiInfoFCUiCUl +
    0x11C
      [1] 0x090000000E05BA2C
    TrackSalApiIteration__26sqleSalEduRetryableApiInfoFCUiCUl +
    0x18C
      [2] 0x090000000E04B694
    sqleSingleCaGetConnection__21SQLE_SINGLE_CA_HANDLEFCPP18SQLE_CA_
    CONN_ENTRYP10SAL_CA_KEYCUlT3C17SAL_ADAPTER_INDEXCUiT3 + 0x458
      [3] 0x090000000D159270
    sqleSingleCaRebalanceAdapters__21SQLE_SINGLE_CA_HANDLEFClCPl +
    0x16C
      [4] 0x090000000D159794 SAL_AdjustAdapterBalance + 0x198
      [5] 0x090000000D1598C4 SAL_AdjustAdapterBalance@glue108 + 0x40
      [6] 0x090000000D159974
    SAL_DoAdapterHousekeeping@OL@3068@AF283_139 + 0x30
      [7] 0x090000000D0FEE68 sqleCFConnPoolMgrEntry__FPUcUi + 0x2F0
      [8] 0x090000000CFF7780 sqleCFConnPoolMgrEntry__FPUcUi + 0x110
      [9] 0x090000000E67D5F4 sqloEDUEntry + 0x4D4
      [10] 0x0900000000546E10 _pthread_body + 0xF0
      [11] 0xFFFFFFFFFFFFFFFC ?unknown + 0xFFFFFFFF
    
    2020-01-15-04.06.34.493523+480 I266168420A1354      LEVEL: Event
    PID     : 52690958             TID : 25237          PROC :
    db2sysc 0
    INSTANCE: Instname              NODE : 000
    HOSTNAME: Hostname1
    EDUID   : 25237                EDUNAME: db2CFConnPoolMgr 0
    FUNCTION: DB2 UDB, Shared Data Structure Abstraction Layer for
    CF, SQLE_SINGLE_CA_HANDLE::sqleSingleCaSearchFreelists,
    probe:5111
    MESSAGE :
    ZRC=0x822701CE=-2111372850=SQLE_SAL_CF_COMM_RETRY_TIMEOUT
              "Attempts to communicate with CF failed."
    DATA #1 : String, 55 bytes
    Unable to find a free connection in the connection pool
    DATA #2 : Codepath, 8 bytes
    5:30
    DATA #3 : CF Connection Pool, Pass 1 list head set.,
    PD_TYPE_SAL_CONNPOOL_GRAB_STATS_PASS1_NON_NULL, 8 bytes
    4264333
    DATA #4 : CF Connection Pool, Pass 1 list head set.,
    PD_TYPE_SAL_CONNPOOL_GRAB_STATS_PASS1_COND_LATCH_SUCCESS, 8
    bytes
    4264283
    DATA #5 : CF Connection Pool, Pass 2 list head set.,
    PD_TYPE_SAL_CONNPOOL_GRAB_STATS_PASS2_NON_NULL, 8 bytes
    8528690
    DATA #6 : CF Connection Pool, Pass 2 list head set.,
    PD_TYPE_SAL_CONNPOOL_GRAB_STATS_PASS2_COND_LATCH_SUCCESS, 8
    bytes
    8528579
    DATA #7 : CF Connection Pool, Pass 2 list head set.,
    PD_TYPE_SAL_CONNPOOL_GRAB_STATS_NUM_YIELDS, 8 bytes
    0
    DATA #8 : String, 31 bytes
    Connection pool growth attempts
    DATA #9 : unsigned integer, 8 bytes
    0
    DATA #10: unsigned integer, 8 bytes
    1
    
    2020-01-15-04.06.34.496918+480 I266169775A1621      LEVEL:
    Severe
    PID     : 52690958             TID : 25237          PROC :
    db2sysc 0
    INSTANCE: Instname              NODE : 000
    HOSTNAME: Hostname1
    EDUID   : 25237                EDUNAME: db2CFConnPoolMgr 0
    FUNCTION: DB2 UDB, Shared Data Structure Abstraction Layer for
    CF, SQLE_SINGLE_CA_HANDLE::sqleSingleCaRebalanceAdapters,
    probe:4615
    MESSAGE :
    ZRC=0x822701CE=-2111372850=SQLE_SAL_CF_COMM_RETRY_TIMEOUT
              "Attempts to communicate with CF failed."
    DATA #1 : Connection pool link adapter number,
    PD_TYPE_SAL_ADAPTER_NUMBER, 8 bytes
    0
    DATA #2 : Connection pool link adapter number,
    PD_TYPE_SAL_ADAPTER_NUMBER, 8 bytes
    2
    DATA #3 : unsigned integer, 8 bytes
    33
    DATA #4 : unsigned integer, 8 bytes
    31
    DATA #5 : unsigned integer, 8 bytes
    2
    DATA #6 : unsigned integer, 8 bytes
    2
    DATA #7 : SAL CF Index, PD_TYPE_SAL_CF_INDEX, 8 bytes
    2
    DATA #8 : SAL CF Node Number, PD_TYPE_SAL_CF_NODE_NUM, 2 bytes
    129
    CALLSTCK: (Static functions may not be resolved correctly, as
    they are resolved to the nearest symbol)
      [0] 0x090000000D15938C
    sqleSingleCaRebalanceAdapters__21SQLE_SINGLE_CA_HANDLEFClCPl +
    0x288
      [1] 0x090000000D159794 SAL_AdjustAdapterBalance + 0x198
      [2] 0x090000000D1598C4 SAL_AdjustAdapterBalance@glue108 + 0x40
      [3] 0x090000000D159974
    SAL_DoAdapterHousekeeping@OL@3068@AF283_139 + 0x30
      [4] 0x090000000D0FEE68 sqleCFConnPoolMgrEntry__FPUcUi + 0x2F0
      [5] 0x090000000CFF7780 sqleCFConnPoolMgrEntry__FPUcUi + 0x110
      [6] 0x090000000E67D5F4 sqloEDUEntry + 0x4D4
      [7] 0x0900000000546E10 _pthread_body + 0xF0
      [8] 0xFFFFFFFFFFFFFFFC ?unknown + 0xFFFFFFFF
    

Local fix

Problem summary

  • ****************************************************************
    * USERS AFFECTED:                                              *
    * purescale                                                    *
    ****************************************************************
    * PROBLEM DESCRIPTION:                                         *
    * See Error Description                                        *
    ****************************************************************
    * RECOMMENDATION:                                              *
    * Apply SB                                                     *
    ****************************************************************
    

Problem conclusion

Temporary fix

Comments

APAR Information

  • APAR number

    IT31706

  • Reported component name

    DB2 FOR LUW

  • Reported component ID

    DB2FORLUW

  • Reported release

    B10

  • Status

    CLOSED FIN

  • PE

    NoPE

  • HIPER

    NoHIPER

  • Special Attention

    NoSpecatt / Xsystem

  • Submitted date

    2020-01-30

  • Closed date

    2021-03-14

  • Last modified date

    2021-03-14

  • APAR is sysrouted FROM one or more of the following:

  • APAR is sysrouted TO one or more of the following:

Fix information

  • Fixed component name

    DB2 FOR LUW

  • Fixed component ID

    DB2FORLUW

Applicable component levels

[{"Line of Business":{"code":"LOB10","label":"Data and AI"},"Business Unit":{"code":"BU053","label":"Cloud & Data Platform"},"Product":{"code":"SSEPGG","label":"Db2 for Linux, UNIX and Windows"},"Platform":[{"code":"PF025","label":"Platform Independent"}],"Version":"11.1"}]

Document Information

Modified date:
15 March 2021