Fixes are available
9.0.5.0: WebSphere Application Server traditional Version 9.0.5 Refresh Pack
9.0.5.1: WebSphere Application Server traditional Version 9.0.5 Fix Pack 1
9.0.5.2: WebSphere Application Server traditional Version 9.0.5 Fix Pack 2
8.5.5.17: WebSphere Application Server V8.5.5 Fix Pack 17
9.0.5.3: WebSphere Application Server traditional Version 9.0.5 Fix Pack 3
8.5.5.20: WebSphere Application Server V8.5.5.20
8.5.5.18: WebSphere Application Server V8.5.5 Fix Pack 18
8.5.5.19: WebSphere Application Server V8.5.5 Fix Pack 19
8.5.5.16: WebSphere Application Server V8.5.5 Fix Pack 16
8.5.5.21: WebSphere Application Server V8.5.5.21
APAR status
Closed as program error.
Error description
Even with com.ibm.ws.batch.parallel.MAXIMUM_CONCURRENT_SUBJOBS set to a value of 1, a WebSphere Application Server Batch top-level job was observed running more than one sub job concurrently.
Local fix
Setting the WebSphere variable GRID_ENDPOINT_PJM_JOB_STATUS_WAIT_INTERVAL to a higher value on the endpoint servers may reduce the liklihood of hitting the window in which the code inadvertantly allows additional subjobs to be run
Problem summary
**************************************************************** * USERS AFFECTED: All users of IBM WebSphere Application * * Server Java Batch * **************************************************************** * PROBLEM DESCRIPTION: The value of the * * com.ibm.ws.batch.parallel.MAXIMUM_CONCU * * RRENT_SUBJOBS property specifies the * * maximum number of sub jobs that can * * run concurrently. However, a timing * * issue can occur where this can be * * exceeded. * **************************************************************** * RECOMMENDATION: * **************************************************************** The top level job handles the submission and completion processing of sub jobs. Only the number of subjobs specified by the com.ibm.ws.batch.parallel.MAXIMUM_CONCURRENT_SUBJOBS should be running at once. However, a timing issue exists such that this could be exceeded. There is a fallback type of check in place that is intended to handle the case where the top level job does not receive a notification that a subjob has completed. Periodically, if the top level job sees that it has not received a notification in some time, it will query the sub job status. If it sees that it is complete, it will release a job from the queue to start running. If this check and release happens just at the right time while the job completes through the normal channel of communication, this is where the extra job can be submitted.
Problem conclusion
A code update has been to synchronize the handling of missed notifications, such that an extra job beyond the specified concurrent sub jobs limit can no longer be submitted. The fix for this APAR is currently targeted for inclusion in fix pack 9.0.5.0 and 8.5.5.16 Please refer to the Recommended Updates page for delivery information: http://www.ibm.com/support/docview.wss?rs=180&uid=swg27004980
Temporary fix
Comments
APAR Information
APAR number
PH08548
Reported component name
WEBSPHERE FOR Z
Reported component ID
5655I3500
Reported release
850
Status
CLOSED PER
PE
NoPE
HIPER
NoHIPER
Special Attention
NoSpecatt / Xsystem
Submitted date
2019-02-13
Closed date
2019-05-23
Last modified date
2019-05-23
APAR is sysrouted FROM one or more of the following:
APAR is sysrouted TO one or more of the following:
Fix information
Fixed component name
WEBSPHERE FOR Z
Fixed component ID
5655I3500
Applicable component levels
R850 PSY
UP
Document Information
Modified date:
28 April 2022