IBM Support

IY92795: TIMING ISSUE DURING CONNECT PROCESSING THAT CAN RESULT IN AN INSTANCE HANG

Subscribe

You can track all active APARs for this component.

 

APAR status

  • Closed as program error.

Error description

  • There exists an internal shared structure call a socket pair
    that is used when an application connects to the database.
    The connection listener process will acquire one of these socket
    pairs, and then when the agent is dispatched for the connection,
    the agent process will get the socket pair and free it.
    
    In this way, this internal structure will increment when the
    connection listener receives a connection, but then quickly
    decrement again when the agent is assigned.
    There is an internal limit of 32 of these socket pairs, however
    under most normal conditions DB2 will never need that many
    because they are used and freed quickly. We expect the
    agent processes will get an equal share of cpu time that the
    listener is getting and so the get/free actions on this
    structure would happen relatively equally.
    
    However, a timing window has been observed such that if a
    connection listener gets 32 consecutive agent dispatch requests
    (and each one acquiring a socket pair) and if the CPU time
    slices given from the OS only go to the listener process in that
    time, then the agent processes do not decrement the number of
    socket pairs in time, and it results in a case where the
    connection listener hits the limit of 32 socket pairs.
    
    If this happens, the connection listener would have a stack
    traceback like this (if manually generated by support team):
    
    msgrcv + 0x98
    sqloCSemP + 0xC8
    GetSharedSocketPair + 0x30
    sqleSendInbound + 0x60
    sqleInitAgentCB + 0x2A8
    sqleGetAgentFromPool + 0x45C
    sqleGetAgent + 0x1C4
    sqlcctcpconnmgr_child + 0xDD0
    sqloCreateEDU + 0x194
    
    It is blocking while getting the socket pair, but there are none
    left.  It is also holding a latch that prevents other agents
    from being dispatched and that results in an instance hang.
    
    This timing issue has only been seen if there is a CPU
    bottleneck that is affecting the timing (i.e. CPU contention or
    CPU spikes).  Connection concentrator environments also seem to
    be vulnerable to this timing issue.
    
    This APAR will help to reduce the chances of hitting this timing
    window.
    

Local fix

  • Tune the system to help reduce CPU contention as a method to try
    to reduce the chances of hitting this rare timing issue.
    It does not impact DB2 9.5 since this design has been re-worked
    in that release.
    

Problem summary

  • As above.
    

Problem conclusion

  • First fixed in DB2 UDB Version 8.1, FixPak 17 (s080813).
    

Temporary fix

Comments

APAR Information

  • APAR number

    IY92795

  • Reported component name

    DB2 CEE AIX

  • Reported component ID

    5765F3000

  • Reported release

    820

  • Status

    CLOSED PER

  • PE

    NoPE

  • HIPER

    NoHIPER

  • Special Attention

    NoSpecatt

  • Submitted date

    2006-12-13

  • Closed date

    2009-02-03

  • Last modified date

    2009-02-03

  • APAR is sysrouted FROM one or more of the following:

  • APAR is sysrouted TO one or more of the following:

    IZ18649

Fix information

  • Fixed component name

    DB2 CEE AIX

  • Fixed component ID

    5765F3000

Applicable component levels

  • R950 PSY

       UP

  • R810 PSN

       UP

  • R820 PSN

       UP

  • R910 PSN

       UP

[{"Business Unit":{"code":"BU058","label":"IBM Infrastructure w\/TPS"},"Product":{"code":"SSEPGG","label":"DB2 for Linux- UNIX and Windows"},"Platform":[{"code":"PF025","label":"Platform Independent"}],"Version":"820","Line of Business":{"code":"LOB10","label":"Data and AI"}}]

Document Information

Modified date:
08 January 2022