IBM Support

IC61883: TPC DATABASE DEADLOCKS OCCUR DURING STATUS PROPAGATION SUCH AS POLLUP POST PROCESSING

Subscribe

You can track all active APARs for this component.

 

APAR status

  • Closed as program error.

Error description

  • PMR NO:  87548,227,000
    COMPID:  5608TC300          AFFECTED RELEASE(s):  3.3.1 - 4.1
    ________________________________________________________________
    ABSTRACT: TPC DATABASE DEADLOCKS OCCUR DURING STATUS PROPAGATION
              SUCH AS ROLLUP POST PROCESSING
    ..
    DETAIL OF PROBLEM (include example(s) of error(s)):  This
    problem appears to manifest in this production environment by
    failing to process probe results at the server end:
        STA0113I: Probe started
        STA0249I: Sending results to server
        GEN0324E: Failed to send request <4:4> to host <hostname>
        GEN0008E: Cannot read from host <hostname>
                    java.net.SocketTimeoutException: Read timed out
        STA0112E: Server did not accept probe results
        STA0108I: Probe aborted
    The root cause occurs due to deadlocks caused by the
    com.ibm.tpc.monitor.eventcorrelator.PropagateStatus.propagate()
    method.  Specifically the deadlocks seem to occur when the host
    propagated status (PROPAGATE_STATUS_HOST) is updated.  Refer to
    development escalation for specific code block sql referred to.
    The specific SQL effectively creates an exclusive lock on every
    row in T_RES_HOST.  Deadlocks occur when this query is being run
    and any other select or update comes in while this propagate is
    running.  The severity of the problem can depend on the number
    of agents in the environment and the number of select and/or
    update requests.  Currently the deadlocks are seen when the data
    server is trying to update the table during agent registration
    requests, probes and when querying data from the T_RES_HOST
    table during rollup post processing.   TPC needs a better way to
    handle status and prevent these deadlocks.
    RECREATE STEPS: See development escalation record.
    
    ________________________________________________________________
    DB2 Version used for Server:  DB2 v8.2 (n/a)
    The defect is against component:  TPC Server
    Server/Manager build/release (TPC): 3.3.1.90, 3.3.2, 4.1 builds
    Agent build/release (TPC): n/a
    Server/Manager (OS):  Windows 2003 (not OS specific)
    Agent (OS):   n/a
    ________________________________________________________________
    Problem as described by customer:  Probe results not accepted by
    server.
    Initial customer impact (low/med/high):  med
    Local Fix:  Contact L2 for examination and possible hotfix.
    

Local fix

Problem summary

  • ****************************************************************
    * USERS AFFECTED: All TPC users using TPC version 3.3.1 and    *
    *                 higer.  The problem usually occurs in        *
    *                 enviroments that contain a large number of   *
    *                 agents (hundreds).                           *
    ****************************************************************
    * PROBLEM DESCRIPTION: Customers with a large amount of        *
    *                      agents (hundreds) may see jobs fail     *
    *                      with timeout exceptions. This will      *
    *                      preent job results from being stored    *
    *                      on the server.  The issue occurs due    *
    *                      to SQL and thread deadlocking in the    *
    *                      Data Server.                            *
    ****************************************************************
    * RECOMMENDATION: Apply fixing level when available. This      *
    *                 problem is currently projected to be fixed   *
    *                 in 3.3.1, 3.3.2, 4.1 and later versions.     *
    *                 Note that this is subject to change at the   *
    *                 discretion of IBM.                           *
    ****************************************************************
    Customers will see errors similar to the following in the
    agent logs:
    
    
      1/1/09 8:34:00 AM AGT0145I: Retrieving job definition from
    server
      1/1/09 9:06:45 AM AGT0152I: Job definition retrieved
      1/1/09 9:06:45 AM STA0113I: Probe started
      1/1/09 9:06:47 AM STA0249I: Sending results to server
      1/1/09 1:06:48 PM GEN0324E: Failed to send request <4:4> to
    host
                        server.host.com.
      1/1/09 1:06:48 PM GEN0008E: Cannot read from host
                        <server.host.com.>
                        java.net.SocketTimeoutException: Read
    timed out
    1/1/09 1:06:48 PM STA0112E: Server did not accept probe results
    1/1/09 1:06:48 PM STA0108I: Probe aborted
    

Problem conclusion

  • The deadlocking was fixed in the code so that the deadlocking
    no longer occurs on the server.  The fix for this APAR is
    targeted for the following maintenance packages:
    
        | fix pack | 3.3.1.x - target October 2009
        | fix pack | 3.3.2.x - target not set
        | fix pack | 4.1.0.x - fix pack 3 - target February 2010
    
    http://www-01.ibm.com/support/docview.wss?&uid=swg21320822
    

Temporary fix

Comments

APAR Information

  • APAR number

    IC61883

  • Reported component name

    TPC FOR DATA

  • Reported component ID

    5608TC300

  • Reported release

    33W

  • Status

    CLOSED PER

  • PE

    NoPE

  • HIPER

    NoHIPER

  • Special Attention

    NoSpecatt

  • Submitted date

    2009-07-06

  • Closed date

    2009-09-02

  • Last modified date

    2009-09-02

  • APAR is sysrouted FROM one or more of the following:

  • APAR is sysrouted TO one or more of the following:

Fix information

  • Fixed component name

    TPC FOR DATA

  • Fixed component ID

    5608TC300

Applicable component levels

  • R33A PSY

       UP

  • R33L PSY

       UP

  • R33W PSY

       UP

  • R41A PSY

       UP

  • R41L PSY

       UP

  • R41W PSY

       UP

[{"Business Unit":{"code":"BU054","label":"Systems w\/TPS"},"Product":{"code":"SS8JB5","label":"Tivoli Storage Productivity Center for Data"},"Component":"","ARM Category":[],"Platform":[{"code":"PF025","label":"Platform Independent"}],"Version":"33W","Edition":"","Line of Business":{"code":"LOB26","label":"Storage"}}]

Document Information

Modified date:
02 September 2009