APAR status
Closed as program error.
Error description
PMR NO: 87548,227,000 COMPID: 5608TC300 AFFECTED RELEASE(s): 3.3.1 - 4.1 ________________________________________________________________ ABSTRACT: TPC DATABASE DEADLOCKS OCCUR DURING STATUS PROPAGATION SUCH AS ROLLUP POST PROCESSING .. DETAIL OF PROBLEM (include example(s) of error(s)): This problem appears to manifest in this production environment by failing to process probe results at the server end: STA0113I: Probe started STA0249I: Sending results to server GEN0324E: Failed to send request <4:4> to host <hostname> GEN0008E: Cannot read from host <hostname> java.net.SocketTimeoutException: Read timed out STA0112E: Server did not accept probe results STA0108I: Probe aborted The root cause occurs due to deadlocks caused by the com.ibm.tpc.monitor.eventcorrelator.PropagateStatus.propagate() method. Specifically the deadlocks seem to occur when the host propagated status (PROPAGATE_STATUS_HOST) is updated. Refer to development escalation for specific code block sql referred to. The specific SQL effectively creates an exclusive lock on every row in T_RES_HOST. Deadlocks occur when this query is being run and any other select or update comes in while this propagate is running. The severity of the problem can depend on the number of agents in the environment and the number of select and/or update requests. Currently the deadlocks are seen when the data server is trying to update the table during agent registration requests, probes and when querying data from the T_RES_HOST table during rollup post processing. TPC needs a better way to handle status and prevent these deadlocks. RECREATE STEPS: See development escalation record. ________________________________________________________________ DB2 Version used for Server: DB2 v8.2 (n/a) The defect is against component: TPC Server Server/Manager build/release (TPC): 3.3.1.90, 3.3.2, 4.1 builds Agent build/release (TPC): n/a Server/Manager (OS): Windows 2003 (not OS specific) Agent (OS): n/a ________________________________________________________________ Problem as described by customer: Probe results not accepted by server. Initial customer impact (low/med/high): med Local Fix: Contact L2 for examination and possible hotfix.
Local fix
Problem summary
**************************************************************** * USERS AFFECTED: All TPC users using TPC version 3.3.1 and * * higer. The problem usually occurs in * * enviroments that contain a large number of * * agents (hundreds). * **************************************************************** * PROBLEM DESCRIPTION: Customers with a large amount of * * agents (hundreds) may see jobs fail * * with timeout exceptions. This will * * preent job results from being stored * * on the server. The issue occurs due * * to SQL and thread deadlocking in the * * Data Server. * **************************************************************** * RECOMMENDATION: Apply fixing level when available. This * * problem is currently projected to be fixed * * in 3.3.1, 3.3.2, 4.1 and later versions. * * Note that this is subject to change at the * * discretion of IBM. * **************************************************************** Customers will see errors similar to the following in the agent logs: 1/1/09 8:34:00 AM AGT0145I: Retrieving job definition from server 1/1/09 9:06:45 AM AGT0152I: Job definition retrieved 1/1/09 9:06:45 AM STA0113I: Probe started 1/1/09 9:06:47 AM STA0249I: Sending results to server 1/1/09 1:06:48 PM GEN0324E: Failed to send request <4:4> to host server.host.com. 1/1/09 1:06:48 PM GEN0008E: Cannot read from host <server.host.com.> java.net.SocketTimeoutException: Read timed out 1/1/09 1:06:48 PM STA0112E: Server did not accept probe results 1/1/09 1:06:48 PM STA0108I: Probe aborted
Problem conclusion
The deadlocking was fixed in the code so that the deadlocking no longer occurs on the server. The fix for this APAR is targeted for the following maintenance packages: | fix pack | 3.3.1.x - target October 2009 | fix pack | 3.3.2.x - target not set | fix pack | 4.1.0.x - fix pack 3 - target February 2010 http://www-01.ibm.com/support/docview.wss?&uid=swg21320822
Temporary fix
Comments
APAR Information
APAR number
IC61883
Reported component name
TPC FOR DATA
Reported component ID
5608TC300
Reported release
33W
Status
CLOSED PER
PE
NoPE
HIPER
NoHIPER
Special Attention
NoSpecatt
Submitted date
2009-07-06
Closed date
2009-09-02
Last modified date
2009-09-02
APAR is sysrouted FROM one or more of the following:
APAR is sysrouted TO one or more of the following:
Fix information
Fixed component name
TPC FOR DATA
Fixed component ID
5608TC300
Applicable component levels
R33A PSY
UP
R33L PSY
UP
R33W PSY
UP
R41A PSY
UP
R41L PSY
UP
R41W PSY
UP
[{"Business Unit":{"code":"BU054","label":"Systems w\/TPS"},"Product":{"code":"SS8JB5","label":"Tivoli Storage Productivity Center for Data"},"Component":"","ARM Category":[],"Platform":[{"code":"PF025","label":"Platform Independent"}],"Version":"33W","Edition":"","Line of Business":{"code":"LOB26","label":"Storage"}}]
Document Information
Modified date:
02 September 2009