IBM Support

PM91934: JOB STUCK IN SUBMITTED STATE - WAITING FOR AN ACTIVE ENDPOINT

Subscribe

You can track all active APARs for this component.

 

APAR status

  • Closed as program error.

Error description

  • The batch job is stuck in submitted state waiting for an
    active endpoint.  The joblog shows:
    
    CWLRB5684I: [05/28/13 15:09:38:331 CEST] Job
    SimpleCIEar:00000 is queued for execution
    
    CWLRB5586I: [05/28/13 15:09:38:686 CEST] CWLRS6021I: List of
    eligible endpoints to execute the job: Node07/BBOS001
    
    However, the job does not get dispatched to the eligible
    endpoint.  The scheduler job log shows the eligibile endpoint
    missing a heartbeat prior to the job submission and is never
    registered again.
    
    Trace: 2013/05/28 09:49:00.916 02 t=9BB7F0 c=UNK key=S2 tag=
    (13007004)
      SourceId:
    com.ibm.ws.gridcontainer.proxy.endpoint.impl.EndpointManagerImpl
      ExtendedMessage: BBOO0222I: No heart beat received from
      EP:Node07/BBOS001 since 1369726984973 which is greater
      than the tolerance interval of 300000
    
    Examining the endpoint logs shows that multiple servants are
    enabled and when servants are terminated, it can cause this
    missing heartbeat:
    
    Trace: 2013/05/28 09:43:31.406 02 t=9E3AE8 c=UNK key=P8 tag=
    (13007004)
      SourceId: com.ibm.ws.runtime.WsServerImpl
      ExtendedMessage: BBOO0222I: WSVR0024I: Server SERVANT PROCESS
      BBOS001S stopped
    
    Trace: 2013/05/28 09:43:31.306 02 t=9E3AE8 c=UNK key=P8 tag=
    (13007004)
      SourceId: com.ibm.ws.runtime.WsServerImpl
      ExtendedMessage: BBOO0222I: WSVR0024I: Server SERVANT PROCESS
      BBOS001S stopped
    

Local fix

Problem summary

  • ****************************************************************
    * USERS AFFECTED:  All WebSphere Compute Grid 8 users on       *
    *                  z/OS using multiple dynamic servant         *
    *                  regions.                                    *
    ****************************************************************
    * PROBLEM DESCRIPTION: When the WLM reduces the number of      *
    *                      servant regions on a GEE endpoint       *
    *                      server, the endpoint becomes            *
    *                      unavailable.                            *
    ****************************************************************
    * RECOMMENDATION:                                              *
    ****************************************************************
    When the WLM reduces the number of servant regions on a GEE
    endpoint server the endpoint becomes unavailble.
    

Problem conclusion

  • A fix was added to check if other servant regions are still up
    each time a servant region is brought down. The fix for this
    APAR is currently targeted for inclusion in fixpack 8.0.0.4.
    Please refer to the Recommended Updates page for delivery
    information:
    http://www.ibm.com/support/docview.wss?uid=swg27022998
    

Temporary fix

Comments

APAR Information

  • APAR number

    PM91934

  • Reported component name

    WXD Z COMP GRID

  • Reported component ID

    5655V6201

  • Reported release

    800

  • Status

    CLOSED PER

  • PE

    NoPE

  • HIPER

    NoHIPER

  • Special Attention

    NoSpecatt

  • Submitted date

    2013-06-26

  • Closed date

    2013-12-04

  • Last modified date

    2013-12-04

  • APAR is sysrouted FROM one or more of the following:

  • APAR is sysrouted TO one or more of the following:

Fix information

  • Fixed component name

    WXD Z COMP GRID

  • Fixed component ID

    5655V6201

Applicable component levels

  • R800 PSY

       UP

[{"Business Unit":{"code":"BU059","label":"IBM Software w\/o TPS"},"Product":{"code":"SS8NUZ","label":"WebSphere Extended Deployment Compute Grid for z\/OS"},"Platform":[{"code":"PF025","label":"Platform Independent"}],"Version":"8.0","Line of Business":{"code":"LOB45","label":"Automation"}}]

Document Information

Modified date:
27 April 2022