IBM Support

LI72660: DB2AGENT SIGSEGV FOLLOWING ERROR WHILE OBTAINING A TABLE QUEUE LATCH

Subscribe

You can track all active APARs for this component.

 

APAR status

  • Closed as program error.

Error description

  • Following a failure to obtain a table queue latch during SMP
    or MPP runtime query processing, the db2 instance can shutdown
    following a sigsegv received a db2agent.
    
    The fix will handle the error more gracefully and avoid to abend
    the instance following a failure to obtain the latch.
    .
    When the db2agent fails to obtain a table queue latch during
    query processing the following messages will be dumped in the
    db2diag.log:
    .
    2007-06-03-04.07.52.012541-300 I84614033E527      LEVEL: Error
    PID     : 2720                 TID  : 183041065728PROC :
    db2agntp (SAMPLE) 42
    INSTANCE: db2inst1             NODE : 042         DB   : SAMPLE
    APPHDL  : 0-65                 APPID: APPID123
    AUTHID  : DB2INST1
    FUNCTION: DB2 UDB, table Q services, sqlktopn, probe:20
    MESSAGE : DIA0001E An internal error occurred. Report the
    following error code:
    "Line=00531, rc1=0x82170001, rc2=0x00000004,
    rc3=0x2e1b20080, msg=TQ already open".
    .
    OR:
    .
    2007-06-03-04.07.52.014940-300 I84618003E504      LEVEL: Error
    PID     : 3695                 TID  : 183041065728PROC :
    db2agntp (SAMPLE) 41
    INSTANCE: db2inst1             NODE : 041       DB:SAMPLE
    APPHDL  : 0-65                 APPID: APPID123
    AUTHID  : DB2INST1
    FUNCTION: DB2 UDB, table Q services, sqlktclo, probe:30
    MESSAGE : DIA0001E An internal error occurred. Report the
    following error code:
    "Line=00389, rc1=0x82170001, rc2=0x00000000,
    rc3=(nil),msg=sqloxltc(tq_latch)".
    .
    Following this initial error while obtaining a table queue
    latch, the db2 processes could abend due to sigsegv due to an
    improper handling of the above error. The following stacks have
    been encountered for the db2 agents that sigsegv:
    .
    - Stack 1 :
     ======================
    
     ossDumpStackTrace + 0x0080
     OSSTrapFile + 0x00aa
     sqlo_trce + 0x033b
     sqloEDUCodeTrapHandler + 0x0074
     determine_cpumask_size + 0x0070
     sqlri_hsjnClose + 0x0c3c
     sqlricjp + 0x0e97
     sqlricls_complex + 0x0ff7
     sqlricls + 0x0068
     sqlra_close_section + 0x0147
     sqlracal_finalcmt_rb + 0x022b
     sqlracal + 0x0311
     sqlrr_cleanup_tran_before_DPS + 0x0469
     sqlrr_tran_router + 0x0734
     sqlrr_subagent_router + 0x0866
     sqleSubRequestRouter + 0x073c
     sqleProcessSubRequest + 0x01aa
     sqleRunAgent + 0x0400
     sqloCreateEDU + 0x0279
     sqloTermEDUServices + 0x057b
     sqloInitEDUServices + 0x0360
     sqloRunInstance + 0x0081
     DBGTerm() + 0x2018
     DB2main + 0x07df
     main + 0x002f
    
    - Stack 2 :
    ======================
    
     ossDumpStackTrace + 0x0080
     OSSTrapFile + 0x00aa
     sqlo_trce + 0x033b
     sqloEDUCodeTrapHandler + 0x0074
     determine_cpumask_size + 0x0070
     sqlri_hsjnItrInit + 0x0000
     sqlrihsjn + 0x055e
     sqlriExecThread + 0x004a
     sqlriSectInvoke + 0x0083
     sqlrr_dss_router + 0x0550
     sqlrr_subagent_router + 0x07eb
     sqleSubRequestRouter + 0x073c
     sqleProcessSubRequest + 0x01aa
     sqleRunAgent + 0x0400
     sqloCreateEDU + 0x0279
     sqloTermEDUServices + 0x057b
     sqloInitEDUServices + 0x0360
     DBGTerm() + 0x2018
     sqloRunInstance + 0x0081
     DB2main + 0x07df
     main + 0x002f
    .
    This APAR addresses the error handling
    following the failure to obtain a table queue latch, so that the
    db2 agent does not sigsegv after the initial error, but return a
    -902 and mark the database as damaged, which would allow to dump
    more diagnostic data collection.
    
    As well, this APAR introduces a new capability to enable
    optionally more diagnostic data collection in case of
    reoccurrence.
    

Local fix

Problem summary

  • see problem description
    

Problem conclusion

  • First fixed in DB2 UDB Version 9.5, FixPak 1
    

Temporary fix

Comments

APAR Information

  • APAR number

    LI72660

  • Reported component name

    DB2 UDE ESE LIN

  • Reported component ID

    5765F4104

  • Reported release

    950

  • Status

    CLOSED PER

  • PE

    NoPE

  • HIPER

    NoHIPER

  • Special Attention

    NoSpecatt

  • Submitted date

    2007-11-06

  • Closed date

    2008-05-16

  • Last modified date

    2008-05-16

  • APAR is sysrouted FROM one or more of the following:

    LI72500

  • APAR is sysrouted TO one or more of the following:

Fix information

  • Fixed component name

    DB2 UDE ESE LIN

  • Fixed component ID

    5765F4104

Applicable component levels

  • R910 PSY

       UP

  • R950 PSY

       UP

[{"Business Unit":{"code":"BU048","label":"IBM Software"},"Product":{"code":"SSEPGG","label":"DB2 for Linux, UNIX and Windows"},"Component":"","ARM Category":[],"Platform":[{"code":"PF025","label":"Platform Independent"}],"Version":"950","Edition":"","Line of Business":{"code":"LOB10","label":"Data and AI"}}]

Document Information

Modified date:
16 May 2008