Fixes are available
APAR status
Closed as program error.
Error description
Customer was running Compute Grid 8.0 in their production environment. The jobs in this environment are triggered via WSGrid, and they bserved that several jobs had active WSGrid sessions that were not reflected as jobs in the JMC. There was only one job was in the executing state in the JMC, but its joblog indicated that it should have been in a different state. Once they cycled the endpoint appserver that this job had run on, at which point the normal flow of jobs through the environment via WSGrid resumed.
Local fix
Problem summary
**************************************************************** * USERS AFFECTED: All users of WebSphere Extended Deployment * * Compute Grid Version 8. * **************************************************************** * PROBLEM DESCRIPTION: Job log streaming for jobs submitted * * via WSGrid (e.g. via an external * * scheduler) appears to stop due to a * * hang or a slowdown on an endpoint * * server executing already-submitted * * (via WSGrid) jobs. * **************************************************************** * RECOMMENDATION: * **************************************************************** The problem can happen when an endpoint server executing jobs dispatched via the WSGrid interface experiences a slowdown, e.g. because it is thrashing en route to running out of memory, If an endpoint slows down enough, it might not respond to the scheduler's requests to receive job log updates and status updates and send them back to the WSGrid client (e.g. the external scheduler). The scheduler threads may hang for a long time, waiting for the endpoint's response. Since there is only a single thread pool in the scheduler used to stream the output from all endpoints, this can lead to the situation where there is no output being received over the WSGrid interface at all, (since all the relevant scheduler threads are hung waiting for output from a single bad server). However, the jobs submitted to the other (good) endpoints should still have executed normally in this scenario, although the output is not handled properly and sent back to the WSGrid client.
Problem conclusion
The scheduler threads streaming output from the endpoint server for WSGrid-submitted jobs back to the WSGrid client will now timeout rather than hanging indefinitely. So a single bad endpoint can slow down output streaming, but only in proportion to the number of jobs on these endpoints compared to the total jobs managed by this scheduler, rather than preventing streaming of all WSGrid output. The fix for this APAR is currently targeted for inclusion in fixpack 8.0.0.3. Please refer to the Recommended Updates page for delivery information: http://www.ibm.com/support/docview.wss?uid=swg27022998
Temporary fix
Comments
APAR Information
APAR number
PM74855
Reported component name
WXD COMPUTE GRI
Reported component ID
5725C9301
Reported release
800
Status
CLOSED PER
PE
NoPE
HIPER
NoHIPER
Special Attention
NoSpecatt
Submitted date
2012-10-11
Closed date
2013-01-02
Last modified date
2013-04-19
APAR is sysrouted FROM one or more of the following:
APAR is sysrouted TO one or more of the following:
Fix information
Fixed component name
WXD COMPUTE GRI
Fixed component ID
5725C9301
Applicable component levels
R800 PSY
UP
[{"Business Unit":{"code":"BU059","label":"IBM Software w\/o TPS"},"Product":{"code":"SSFVRM","label":"WebSphere Extended Deployment Compute Grid"},"Platform":[{"code":"PF025","label":"Platform Independent"}],"Version":"8.0","Line of Business":{"code":"LOB45","label":"Automation"}}]
Document Information
Modified date:
29 October 2021