Fixes are available
APAR status
Closed as program error.
Error description
The issue is a race condition caused by a lack of memory barriers when barrierWaitInterruptible() is used. The unused function barrierWait() may also have a similar issue, and that will be fixed as well. The hang includes the following stacks: ================================================================ ================ === 35184.248967.014.stack.txt: 248967 - db2agntcol (BLUDB) 14 [-] 0x000010001664FF80 ossWasteTime + 0x0050 0x0000100009A12B68 ibm_cde::query::NativeSortCB::paradisSort(ibm_cde::query::NSJob* , unsigned long) + 0x0f38 0x0000100009A1135C ibm_cde::query::NativeSortCB::sort(unsigned int) + 0x0d0c 0x000010000996479C ibm_cde::query::SortEvaluator::sortPartition(ibm_cde::query::Sor tPartition*) + 0x014c 0x00001000099628B0 ibm_cde::query::SortEvaluator::processInputsSynchronously() + 0x0590 0x0000100006CA685C ibm_cde::query::Evaluator::evaluate(bool, bool, ibm_cde::query::Evaluator::EvaluatorRestartState&, ibm_cde::query::OptPredicateTracker*) + 0x088c 0x0000100006BA279C ibm_cde::query::EvaluationRoutine::evaluate(unsigned int, sql_static_data*) + 0x03ac 0x0000100007AC3E88 ibm_cde::query::Scheduler::evaluateChain(ibm_cde::query::Evaluat ionRoutine*, unsigned long&, unsigned int) + 0x0418 0x0000100007AC0F18 ibm_cde::query::Scheduler::runWorkerThread(void*, int*) + 0x03b8 0x0000100007AC8CDC ibm_cde::query::cdeEntryPointImpl(sqeAgent*, void*, void*) + 0x00bc 0x0000100008CFBFDC cdeInterface::startCdeSubagent(sqeAgent*) + 0x00ec 0x000010000F214384 sqlriInvokeCde(sqlrr_cb*) + 0x0064 0x000010000F0487F0 sqlriSectInvoke(sqlrr_cb*, sqlri_opparm*) + 0x0410 ================================================================ ================ === 35184.226015.014.stack.txt: 226015 - db2agntcol (BLUDB) 14 [-] === 35184.226037.014.stack.txt: 226037 - db2agntcol (BLUDB) 14 [-] === 35184.248731.014.stack.txt: 248731 - db2agntcol (BLUDB) 14 [-] === 35184.249369.014.stack.txt: 249369 - db2agntcol (BLUDB) 14 [-] 0x00001000000942B8 __nanosleep + 0x0088 0x0000100009A107EC ibm_cde::query::NativeSortCB::sort(unsigned int) + 0x019c 0x000010000996479C ibm_cde::query::SortEvaluator::sortPartition(ibm_cde::query::Sor tPartition*) + 0x014c 0x00001000099628B0 ibm_cde::query::SortEvaluator::processInputsSynchronously() + 0x0590 0x0000100006CA685C ibm_cde::query::Evaluator::evaluate(bool, bool, ibm_cde::query::Evaluator::EvaluatorRestartState&, ibm_cde::query::OptPredicateTracker*) + 0x088c 0x0000100006BA279C ibm_cde::query::EvaluationRoutine::evaluate(unsigned int, sql_static_data*) + 0x03ac 0x0000100007AC3E88 ibm_cde::query::Scheduler::evaluateChain(ibm_cde::query::Evaluat ionRoutine*, unsigned long&, unsigned int) + 0x0418 0x0000100007AC0F18 ibm_cde::query::Scheduler::runWorkerThread(void*, int*) + 0x03b8 0x0000100007AC8CDC ibm_cde::query::cdeEntryPointImpl(sqeAgent*, void*, void*) + 0x00bc 0x0000100008CFBFDC cdeInterface::startCdeSubagent(sqeAgent*) + 0x00ec 0x000010000F214384 sqlriInvokeCde(sqlrr_cb*) + 0x0064 0x000010000F0487F0 sqlriSectInvoke(sqlrr_cb*, sqlri_opparm*) + 0x0410 This is on power pc which is heavily inlined. There may be some barrier related functions on the stack between paradisSort and ossWasteTime on other platforms.
Local fix
There are 2 possible workarounds: 1)Re-submitting the query, with a different system workload may works. 2)The using of the following Registry setting : db2set DB2_REDUCED_OPTIMIZATION=COL_NO_OLAP or passing it to the query in embedded guidelines : /* <OPTGUIDELINES> <REGISTRY> <OPTION NAME='DB2_REDUCED_OPTIMIZATION' VALUE='COL_NO_OLAP'/> </REGISTRY> </OPTGUIDELINES>*/
Problem summary
**************************************************************** * USERS AFFECTED: * * All * **************************************************************** * PROBLEM DESCRIPTION: * * See Error Description * **************************************************************** * RECOMMENDATION: * * None * ****************************************************************
Problem conclusion
The fix will be included into DB2 11.1 Fix Pack m4fp6
Temporary fix
Comments
APAR Information
APAR number
IT33322
Reported component name
DB2 FOR LUW
Reported component ID
DB2FORLUW
Reported release
B10
Status
CLOSED PER
PE
NoPE
HIPER
NoHIPER
Special Attention
NoSpecatt / Xsystem
Submitted date
2020-06-25
Closed date
2021-01-28
Last modified date
2021-01-28
APAR is sysrouted FROM one or more of the following:
APAR is sysrouted TO one or more of the following:
Fix information
Fixed component name
DB2 FOR LUW
Fixed component ID
DB2FORLUW
Applicable component levels
RB10 PSN
UP
[{"Business Unit":{"code":"BU058","label":"IBM Infrastructure w\/TPS"},"Product":{"code":"SSEPGG","label":"DB2 for Linux- UNIX and Windows"},"Platform":[{"code":"PF025","label":"Platform Independent"}],"Version":"11.1","Line of Business":{"code":"LOB10","label":"Data and AI"}}]
Document Information
Modified date:
04 May 2022