IBM Support

IT33322: CDE QUERY SEEMS HANGING, IT DOESN'T COMPLETE IN HOURS WHEN IT NORMALLY COMPLETES IN MINUTES THE CAUSE IS CDE PARADISE SORT HANGS

Subscribe to this APAR

By subscribing, you receive periodic emails alerting you to the status of the APAR, along with a link to the fix after it becomes available. You can track this item individually or track all items by product.

Notify me when this APAR changes.

Notify me when an APAR for this component changes.

 

APAR status

  • Closed as program error.

Error description

  • The issue is a race condition caused by a lack of memory
    barriers when barrierWaitInterruptible() is used.  The unused
    function barrierWait() may also have a similar issue, and that
    will be fixed as well.
    
    The hang includes the following stacks:
    
    ================================================================
    ================
    === 35184.248967.014.stack.txt:                          248967
    - db2agntcol (BLUDB) 14 [-]
    0x000010001664FF80 ossWasteTime + 0x0050
    0x0000100009A12B68
    ibm_cde::query::NativeSortCB::paradisSort(ibm_cde::query::NSJob*
    , unsigned long) + 0x0f38
    0x0000100009A1135C ibm_cde::query::NativeSortCB::sort(unsigned
    int) + 0x0d0c
    0x000010000996479C
    ibm_cde::query::SortEvaluator::sortPartition(ibm_cde::query::Sor
    tPartition*) + 0x014c
    0x00001000099628B0
    ibm_cde::query::SortEvaluator::processInputsSynchronously() +
    0x0590
    0x0000100006CA685C ibm_cde::query::Evaluator::evaluate(bool,
    bool, ibm_cde::query::Evaluator::EvaluatorRestartState&,
    ibm_cde::query::OptPredicateTracker*) + 0x088c
    0x0000100006BA279C
    ibm_cde::query::EvaluationRoutine::evaluate(unsigned int,
    sql_static_data*) + 0x03ac
    0x0000100007AC3E88
    ibm_cde::query::Scheduler::evaluateChain(ibm_cde::query::Evaluat
    ionRoutine*, unsigned long&, unsigned int) + 0x0418
    0x0000100007AC0F18
    ibm_cde::query::Scheduler::runWorkerThread(void*, int*) + 0x03b8
    0x0000100007AC8CDC ibm_cde::query::cdeEntryPointImpl(sqeAgent*,
    void*, void*) + 0x00bc
    0x0000100008CFBFDC cdeInterface::startCdeSubagent(sqeAgent*) +
    0x00ec
    0x000010000F214384 sqlriInvokeCde(sqlrr_cb*) + 0x0064
    0x000010000F0487F0 sqlriSectInvoke(sqlrr_cb*, sqlri_opparm*) +
    0x0410
    
    
    ================================================================
    ================
    === 35184.226015.014.stack.txt:                          226015
    - db2agntcol (BLUDB) 14 [-]
    === 35184.226037.014.stack.txt:                          226037
    - db2agntcol (BLUDB) 14 [-]
    === 35184.248731.014.stack.txt:                          248731
    - db2agntcol (BLUDB) 14 [-]
    === 35184.249369.014.stack.txt:                          249369
    - db2agntcol (BLUDB) 14 [-]
    0x00001000000942B8 __nanosleep + 0x0088
    0x0000100009A107EC ibm_cde::query::NativeSortCB::sort(unsigned
    int) + 0x019c
    0x000010000996479C
    ibm_cde::query::SortEvaluator::sortPartition(ibm_cde::query::Sor
    tPartition*) + 0x014c
    0x00001000099628B0
    ibm_cde::query::SortEvaluator::processInputsSynchronously() +
    0x0590
    0x0000100006CA685C ibm_cde::query::Evaluator::evaluate(bool,
    bool, ibm_cde::query::Evaluator::EvaluatorRestartState&,
    ibm_cde::query::OptPredicateTracker*) + 0x088c
    0x0000100006BA279C
    ibm_cde::query::EvaluationRoutine::evaluate(unsigned int,
    sql_static_data*) + 0x03ac
    0x0000100007AC3E88
    ibm_cde::query::Scheduler::evaluateChain(ibm_cde::query::Evaluat
    ionRoutine*, unsigned long&, unsigned int) + 0x0418
    0x0000100007AC0F18
    ibm_cde::query::Scheduler::runWorkerThread(void*, int*) + 0x03b8
    0x0000100007AC8CDC ibm_cde::query::cdeEntryPointImpl(sqeAgent*,
    void*, void*) + 0x00bc
    0x0000100008CFBFDC cdeInterface::startCdeSubagent(sqeAgent*) +
    0x00ec
    0x000010000F214384 sqlriInvokeCde(sqlrr_cb*) + 0x0064
    0x000010000F0487F0 sqlriSectInvoke(sqlrr_cb*, sqlri_opparm*) +
    0x0410
    
    This is on power pc which is heavily inlined.
    There may be some barrier related functions on the stack between
    paradisSort and ossWasteTime on other platforms.
    

Local fix

  • There are 2 possible workarounds:
    
    1)Re-submitting the query, with a different system workload may
    works.
    
    2)The using of the following Registry setting :
    
    db2set DB2_REDUCED_OPTIMIZATION=COL_NO_OLAP
    
    or passing it to the query in  embedded guidelines :
    
    /* <OPTGUIDELINES>
    <REGISTRY>
    <OPTION NAME='DB2_REDUCED_OPTIMIZATION' VALUE='COL_NO_OLAP'/>
    </REGISTRY>
    </OPTGUIDELINES>*/
    

Problem summary

  • ****************************************************************
    * USERS AFFECTED:                                              *
    * All                                                          *
    ****************************************************************
    * PROBLEM DESCRIPTION:                                         *
    * See Error Description                                        *
    ****************************************************************
    * RECOMMENDATION:                                              *
    * None                                                         *
    ****************************************************************
    

Problem conclusion

  • The fix will be included into DB2 11.1 Fix Pack m4fp6
    

Temporary fix

Comments

APAR Information

  • APAR number

    IT33322

  • Reported component name

    DB2 FOR LUW

  • Reported component ID

    DB2FORLUW

  • Reported release

    B10

  • Status

    CLOSED PER

  • PE

    NoPE

  • HIPER

    NoHIPER

  • Special Attention

    NoSpecatt / Xsystem

  • Submitted date

    2020-06-25

  • Closed date

    2021-01-28

  • Last modified date

    2021-01-28

  • APAR is sysrouted FROM one or more of the following:

  • APAR is sysrouted TO one or more of the following:

Fix information

  • Fixed component name

    DB2 FOR LUW

  • Fixed component ID

    DB2FORLUW

Applicable component levels

[{"Line of Business":{"code":"LOB10","label":"Data and AI"},"Business Unit":{"code":"BU053","label":"Cloud & Data Platform"},"Product":{"code":"SSEPGG","label":"Db2 for Linux, UNIX and Windows"},"Platform":[{"code":"PF025","label":"Platform Independent"}],"Version":"11.1"}]

Document Information

Modified date:
29 January 2021