IBM Support

IT37434: DPF/BLU SYSTEMS TQ HANG IN RECEIVEFROMANYCONNECTION

Subscribe

You can track all active APARs for this component.

 

APAR status

  • Closed as program error.

Error description

  • Issue:
    -----
    In the DPF/BLU multi node systems the TQs of the job/queries ,
    may hang on send/recieve buffers on a empty buffer and hang
    forever.
    
    db2level of the issue found:
    --------------
    "DB2 v11.1.4.5", "s1911120100", "DYN1911120100AMD64",
    and Fix Pack "5".
    --------------
    
    symptoms:
    ---------
    It can affect to any sql/jobs which would be going to hang
    state.
    In some cases we could see forcing the the hanging app handle
    and resubmitting the same would successfully run to completion
    as normal.
    
    From the global app handle snap & the stacks related to the
    agents we could see that.,
    all the agents are waiting in tqs or waiting for agents that are
    waiting in tqs.
    This would happen to only some of the participating nodes and
    its sub sections can be seen in state "Waiting to receive on
    tablequeue" and most other nodes would be in "complete" state.
    
    This stacks would like similar to the below:
    ...
                    (/head/home/db2inst1/sqllib/lib64/libdb2e.so.1)
    0x00002AAAB045891B _ZN11sqkfChannel13WaitRecvReadyEii + 0x075b
                    (/head/home/db2inst1/sqllib/lib64/libdb2e.so.1)
    0x00002AAAB0459802
    _ZN11sqkfChannel13ReceiveBufferEPP10sqkfBufferi + 0x01c2
                    (/head/home/db2inst1/sqllib/lib64/libdb2e.so.1)
    0x00002AAAB049C482
    _Z8sqlkqrcvP8SQLKQ_CBP8sqeAgentPiPP10sqkfBufferS3_PN7ibm_cde5que
    ry16RuntimeStatEntryES3_Pb + 0x04d2
                    (/head/home/db2inst1/sqllib/lib64/libdb2e.so.1)
    0x00002AAAAF28B8C9
    _ZN7ibm_cde5query16TableQueueBuffer13receiveBufferERsPNS0_16Runt
    imeStatEntryEi + 0x0129
                    (/head/home/db2inst1/sqllib/lib64/libdb2e.so.1)
    0x00002AAAAF28C782
    _ZN7ibm_cde5query16TableQueueReader24receiveFromAnyConnectionEi
    + 0x0122
                    (/head/home/db2inst1/sqllib/lib64/libdb2e.so.1)
    0x00002AAAAF2C8E71
    _ZN7ibm_cde5query23TableQueueReadEvaluator9getNextWUEPNS0_16Tabl
    eQueueReaderEP31sqlkq_prof_tq_recv_events_timesPh + 0x0361
                    (/head/home/db2inst1/sqllib/lib64/libdb2e.so.1)
    0x00002AAAAF2ACB60
    _ZN7ibm_cde5query23TableQueueReadEvaluator26processInputsSynchro
    nouslyEv + 0x0280
                    (/head/home/db2inst1/sqllib/lib64/libdb2e.so.1)
    0x00002AAAADFBE1C9
    _ZN7ibm_cde5query9Evaluator8evaluateEbbRNS1_21EvaluatorRestartSt
    ateEPNS0_19OptPredicateTrackerE + 0x03a9
                    (/head/home/db2inst1/sqllib/lib64/libdb2e.so.1)
    0x00002AAAADECD97C
    _ZN7ibm_cde5query17EvaluationRoutine8evaluateEjP15sql_static_dat
    a + 0x030c
                    (/head/home/db2inst1/sqllib/lib64/libdb2e.so.1)
    0x00002AAAAE869655
    _ZN7ibm_cde5query9Scheduler13evaluateChainEPNS0_17EvaluationRout
    ineERm + 0x0285
                    (/head/home/db2inst1/sqllib/lib64/libdb2e.so.1)
    0x00002AAAAE86CA33
    _ZN7ibm_cde5query9Scheduler15runWorkerThreadEPvPi + 0x0363
    ...
    
    OR similar stacks:
    
    .....
      0x00007F91D6E67D3F sqloWaitEDUWaitPost + 0x03bf
      0x00007F91D5837BB9
    _ZN11sqkfChannel13WaitSendReadyEiiiP10sqkfBuffer + 0x0b39
      0x00007F91D5840C52 _ZN11sqkfChannel14isAnySendReadyEPhiiPi +
    0x0382
      0x00007F91D58867C2
    _Z8sqlkqsndP8SQLKQ_CBP8sqeAgentiPP10sqkfBufferitPN7ibm_cde5query
    16RuntimeStatEntryEPb + 0x1d92
      0x00007F91D40700AD
    _ZN7ibm_cde5query16TableQueueBuffer10sendBufferEsRmPNS0_16Runtim
    eStatEntryEb + 0x044d
      0x00007F91D406F439 _ZN7ibm_cde5query16TableQueueWriter4sendEb
    + 0x06a9
      0x00007F91D407060B
    _ZN7ibm_cde5query16TableQueueWriter6sendWUEb + 0x009b
      0x00007F91D4083B1B
    _ZN7ibm_cde5query24TableQueueWriteEvaluator20createAndWriteMiniW
    UEmRSt6vectorINS_8services18CountedPtrWithCopyIKNS_5types10Vecto
    rBaseILNS5_12NullIndParamE0ELNS5_6PolicyE0EEELNS3_16CountedPtrDe
    leteE2ELNS3_19CountedPtrThreadingE1EEENS3_9AllocatorISD_EEEmPNS0
    _16TableQueueWriterEsRj + 0x064b
    .....
    
    This agent is doing a receiveFromAnyConnection (which sets the
    target connection number to receive from to SQLKQ_CONN_ANY), but
    down in FCM is blocking on ReceiveBuffer rather than
    ReceiveAnyBuffer.
    
    The TQ received an empty buffer and have inadvertently pegged
    onto a single connection (wherever it got the empty buffer
    from).
    This APAR fix would resolve ., the loop in the TQ
    receiveBuffer() function and reset the input connection number
    in the case where a valid buffer was not received (for
    comparison, see the similar row store TQ function, sqlktrcv).
    
    Resolution:
    --------
    Request for SB or upgrade to v11.5 where the issue already
    fixed.
    

Local fix

  • N/a
    

Problem summary

  • ****************************************************************
    * USERS AFFECTED:                                              *
    * all                                                          *
    ****************************************************************
    * PROBLEM DESCRIPTION:                                         *
    * See Error Description                                        *
    ****************************************************************
    * RECOMMENDATION:                                              *
    * Upgrade to 11.1.4.7                                          *
    ****************************************************************
    

Problem conclusion

  • Upgrade to 11.1.4.7
    

Temporary fix

Comments

APAR Information

  • APAR number

    IT37434

  • Reported component name

    DB2 FOR LUW

  • Reported component ID

    DB2FORLUW

  • Reported release

    B10

  • Status

    CLOSED PER

  • PE

    NoPE

  • HIPER

    NoHIPER

  • Special Attention

    NoSpecatt / Xsystem

  • Submitted date

    2021-06-28

  • Closed date

    2022-04-16

  • Last modified date

    2022-04-16

  • APAR is sysrouted FROM one or more of the following:

  • APAR is sysrouted TO one or more of the following:

Fix information

  • Fixed component name

    DB2 FOR LUW

  • Fixed component ID

    DB2FORLUW

Applicable component levels

  • RB10 PSN

       UP

[{"Business Unit":{"code":"BU058","label":"IBM Infrastructure w\/TPS"},"Product":{"code":"SSEPGG","label":"DB2 for Linux- UNIX and Windows"},"Platform":[{"code":"PF025","label":"Platform Independent"}],"Version":"11.1","Line of Business":{"code":"LOB10","label":"Data and AI"}}]

Document Information

Modified date:
04 May 2022