A fix is available
APAR status
Closed as program error.
Error description
Issue: ----- In the DPF/BLU multi node systems the TQs of the job/queries , may hang on send/recieve buffers on a empty buffer and hang forever. db2level of the issue found: -------------- "DB2 v11.1.4.5", "s1911120100", "DYN1911120100AMD64", and Fix Pack "5". -------------- symptoms: --------- It can affect to any sql/jobs which would be going to hang state. In some cases we could see forcing the the hanging app handle and resubmitting the same would successfully run to completion as normal. From the global app handle snap & the stacks related to the agents we could see that., all the agents are waiting in tqs or waiting for agents that are waiting in tqs. This would happen to only some of the participating nodes and its sub sections can be seen in state "Waiting to receive on tablequeue" and most other nodes would be in "complete" state. This stacks would like similar to the below: ... (/head/home/db2inst1/sqllib/lib64/libdb2e.so.1) 0x00002AAAB045891B _ZN11sqkfChannel13WaitRecvReadyEii + 0x075b (/head/home/db2inst1/sqllib/lib64/libdb2e.so.1) 0x00002AAAB0459802 _ZN11sqkfChannel13ReceiveBufferEPP10sqkfBufferi + 0x01c2 (/head/home/db2inst1/sqllib/lib64/libdb2e.so.1) 0x00002AAAB049C482 _Z8sqlkqrcvP8SQLKQ_CBP8sqeAgentPiPP10sqkfBufferS3_PN7ibm_cde5que ry16RuntimeStatEntryES3_Pb + 0x04d2 (/head/home/db2inst1/sqllib/lib64/libdb2e.so.1) 0x00002AAAAF28B8C9 _ZN7ibm_cde5query16TableQueueBuffer13receiveBufferERsPNS0_16Runt imeStatEntryEi + 0x0129 (/head/home/db2inst1/sqllib/lib64/libdb2e.so.1) 0x00002AAAAF28C782 _ZN7ibm_cde5query16TableQueueReader24receiveFromAnyConnectionEi + 0x0122 (/head/home/db2inst1/sqllib/lib64/libdb2e.so.1) 0x00002AAAAF2C8E71 _ZN7ibm_cde5query23TableQueueReadEvaluator9getNextWUEPNS0_16Tabl eQueueReaderEP31sqlkq_prof_tq_recv_events_timesPh + 0x0361 (/head/home/db2inst1/sqllib/lib64/libdb2e.so.1) 0x00002AAAAF2ACB60 _ZN7ibm_cde5query23TableQueueReadEvaluator26processInputsSynchro nouslyEv + 0x0280 (/head/home/db2inst1/sqllib/lib64/libdb2e.so.1) 0x00002AAAADFBE1C9 _ZN7ibm_cde5query9Evaluator8evaluateEbbRNS1_21EvaluatorRestartSt ateEPNS0_19OptPredicateTrackerE + 0x03a9 (/head/home/db2inst1/sqllib/lib64/libdb2e.so.1) 0x00002AAAADECD97C _ZN7ibm_cde5query17EvaluationRoutine8evaluateEjP15sql_static_dat a + 0x030c (/head/home/db2inst1/sqllib/lib64/libdb2e.so.1) 0x00002AAAAE869655 _ZN7ibm_cde5query9Scheduler13evaluateChainEPNS0_17EvaluationRout ineERm + 0x0285 (/head/home/db2inst1/sqllib/lib64/libdb2e.so.1) 0x00002AAAAE86CA33 _ZN7ibm_cde5query9Scheduler15runWorkerThreadEPvPi + 0x0363 ... OR similar stacks: ..... 0x00007F91D6E67D3F sqloWaitEDUWaitPost + 0x03bf 0x00007F91D5837BB9 _ZN11sqkfChannel13WaitSendReadyEiiiP10sqkfBuffer + 0x0b39 0x00007F91D5840C52 _ZN11sqkfChannel14isAnySendReadyEPhiiPi + 0x0382 0x00007F91D58867C2 _Z8sqlkqsndP8SQLKQ_CBP8sqeAgentiPP10sqkfBufferitPN7ibm_cde5query 16RuntimeStatEntryEPb + 0x1d92 0x00007F91D40700AD _ZN7ibm_cde5query16TableQueueBuffer10sendBufferEsRmPNS0_16Runtim eStatEntryEb + 0x044d 0x00007F91D406F439 _ZN7ibm_cde5query16TableQueueWriter4sendEb + 0x06a9 0x00007F91D407060B _ZN7ibm_cde5query16TableQueueWriter6sendWUEb + 0x009b 0x00007F91D4083B1B _ZN7ibm_cde5query24TableQueueWriteEvaluator20createAndWriteMiniW UEmRSt6vectorINS_8services18CountedPtrWithCopyIKNS_5types10Vecto rBaseILNS5_12NullIndParamE0ELNS5_6PolicyE0EEELNS3_16CountedPtrDe leteE2ELNS3_19CountedPtrThreadingE1EEENS3_9AllocatorISD_EEEmPNS0 _16TableQueueWriterEsRj + 0x064b ..... This agent is doing a receiveFromAnyConnection (which sets the target connection number to receive from to SQLKQ_CONN_ANY), but down in FCM is blocking on ReceiveBuffer rather than ReceiveAnyBuffer. The TQ received an empty buffer and have inadvertently pegged onto a single connection (wherever it got the empty buffer from). This APAR fix would resolve ., the loop in the TQ receiveBuffer() function and reset the input connection number in the case where a valid buffer was not received (for comparison, see the similar row store TQ function, sqlktrcv). Resolution: -------- Request for SB or upgrade to v11.5 where the issue already fixed.
Local fix
N/a
Problem summary
**************************************************************** * USERS AFFECTED: * * all * **************************************************************** * PROBLEM DESCRIPTION: * * See Error Description * **************************************************************** * RECOMMENDATION: * * Upgrade to 11.1.4.7 * ****************************************************************
Problem conclusion
Upgrade to 11.1.4.7
Temporary fix
Comments
APAR Information
APAR number
IT37434
Reported component name
DB2 FOR LUW
Reported component ID
DB2FORLUW
Reported release
B10
Status
CLOSED PER
PE
NoPE
HIPER
NoHIPER
Special Attention
NoSpecatt / Xsystem
Submitted date
2021-06-28
Closed date
2022-04-16
Last modified date
2022-04-16
APAR is sysrouted FROM one or more of the following:
APAR is sysrouted TO one or more of the following:
Fix information
Fixed component name
DB2 FOR LUW
Fixed component ID
DB2FORLUW
Applicable component levels
RB10 PSN
UP
[{"Business Unit":{"code":"BU058","label":"IBM Infrastructure w\/TPS"},"Product":{"code":"SSEPGG","label":"DB2 for Linux- UNIX and Windows"},"Platform":[{"code":"PF025","label":"Platform Independent"}],"Version":"11.1","Line of Business":{"code":"LOB10","label":"Data and AI"}}]
Document Information
Modified date:
04 May 2022