PM91934: JOB STUCK IN SUBMITTED STATE - WAITING FOR AN ACTIVE ENDPOINT

Fixes are available

APAR status

Closed as program error.

Error description

The batch job is stuck in submitted state waiting for an
active endpoint.  The joblog shows:

CWLRB5684I: [05/28/13 15:09:38:331 CEST] Job
SimpleCIEar:00000 is queued for execution

CWLRB5586I: [05/28/13 15:09:38:686 CEST] CWLRS6021I: List of
eligible endpoints to execute the job: Node07/BBOS001

However, the job does not get dispatched to the eligible
endpoint.  The scheduler job log shows the eligibile endpoint
missing a heartbeat prior to the job submission and is never
registered again.

Trace: 2013/05/28 09:49:00.916 02 t=9BB7F0 c=UNK key=S2 tag=
(13007004)
  SourceId:
com.ibm.ws.gridcontainer.proxy.endpoint.impl.EndpointManagerImpl
  ExtendedMessage: BBOO0222I: No heart beat received from
  EP:Node07/BBOS001 since 1369726984973 which is greater
  than the tolerance interval of 300000

Examining the endpoint logs shows that multiple servants are
enabled and when servants are terminated, it can cause this
missing heartbeat:

Trace: 2013/05/28 09:43:31.406 02 t=9E3AE8 c=UNK key=P8 tag=
(13007004)
  SourceId: com.ibm.ws.runtime.WsServerImpl
  ExtendedMessage: BBOO0222I: WSVR0024I: Server SERVANT PROCESS
  BBOS001S stopped

Trace: 2013/05/28 09:43:31.306 02 t=9E3AE8 c=UNK key=P8 tag=
(13007004)
  SourceId: com.ibm.ws.runtime.WsServerImpl
  ExtendedMessage: BBOO0222I: WSVR0024I: Server SERVANT PROCESS
  BBOS001S stopped

Local fix

Problem summary

****************************************************************
* USERS AFFECTED:  All WebSphere Compute Grid 8 users on       *
*                  z/OS using multiple dynamic servant         *
*                  regions.                                    *
****************************************************************
* PROBLEM DESCRIPTION: When the WLM reduces the number of      *
*                      servant regions on a GEE endpoint       *
*                      server, the endpoint becomes            *
*                      unavailable.                            *
****************************************************************
* RECOMMENDATION:                                              *
****************************************************************
When the WLM reduces the number of servant regions on a GEE
endpoint server the endpoint becomes unavailble.

Problem conclusion

A fix was added to check if other servant regions are still up
each time a servant region is brought down. The fix for this
APAR is currently targeted for inclusion in fixpack 8.0.0.4.
Please refer to the Recommended Updates page for delivery
information:
http://www.ibm.com/support/docview.wss?uid=swg27022998

Temporary fix

Comments

APAR Information

APAR number
PM91934
Reported component name
WXD Z COMP GRID
Reported component ID
5655V6201
Reported release
800
Status
CLOSED PER
PE
NoPE
HIPER
NoHIPER
Special Attention
NoSpecatt
Submitted date
2013-06-26
Closed date
2013-12-04
Last modified date
2013-12-04

APAR is sysrouted FROM one or more of the following:
APAR is sysrouted TO one or more of the following:

Fix information

Fixed component name
WXD Z COMP GRID
Fixed component ID
5655V6201

Applicable component levels

R800 PSY
UP

[{"Business Unit":{"code":"BU059","label":"IBM Software w\/o TPS"},"Product":{"code":"SS8NUZ","label":"WebSphere Extended Deployment Compute Grid for z\/OS"},"Platform":[{"code":"PF025","label":"Platform Independent"}],"Version":"8.0","Line of Business":{"code":"LOB45","label":"Automation"}}]

Document Information

Modified date:
27 April 2022

Tips

PM91934: JOB STUCK IN SUBMITTED STATE - WAITING FOR AN ACTIVE ENDPOINT

Fixes are available

Subscribe

APAR status

Closed as program error.

Error description

Local fix

Problem summary

Problem conclusion

Temporary fix

Comments

APAR Information

APAR number

Reported component name

Reported component ID

Reported release

Status

PE

HIPER

Special Attention

Submitted date

Closed date

Last modified date

APAR is sysrouted FROM one or more of the following:

APAR is sysrouted TO one or more of the following:

Fix information

Fixed component name

Fixed component ID

Applicable component levels

R800 PSY

Document Information

Share your feedback

Need support?