IBM Support

PH36899: IMPROVEMENTS TO WSGRID TAKEOVER

Subscribe to this APAR

By subscribing, you receive periodic emails alerting you to the status of the APAR, along with a link to the fix after it becomes available. You can track this item individually or track all items by product.

Notify me when this APAR changes.

Notify me when an APAR for this component changes.

 

APAR status

  • Closed as new function.

Error description

  • Several improvements to WSGRID takeover
    

Local fix

Problem summary

  • ****************************************************************
    * USERS AFFECTED:  All users of IBM WebSphere Application      *
    *                  Server                                      *
    *                  for z/OS Java Batch                         *
    ****************************************************************
    * PROBLEM DESCRIPTION: WSGRID Java Batch jobs on z/OS are not  *
    *                      taken over if the owning scheduler      *
    *                      becomes unavailable                     *
    ****************************************************************
    * RECOMMENDATION:                                              *
    ****************************************************************
    When a Java Batch job is submitted on z/OS using the WSGRID
    native client, communication back to the client from the Job
    Scheduler is done with MQ messages issued by an MDB which
    monitors status of the job through that same Job Scheduler.
    If that Job Scheduler becomes unavailable for some reason, there
    is periodic checking that occurs if a specified lock name (ENQ
    name) is held.  If it is gone, then the client knows the Job
    Scheduler is down and the WSGRID proxy job is ended. By default
    on z/OS if this happens, the java batch job itself on the batch
    endpoint server is also stopped, thus keeping the status in
    sync.
    If the WebSphere custom property
    com.ibm.websphere.batch.policy.EndJobWhenSchedulerEnds is set to
    false, then the java batch job will continue to run and another
    active Job Scheduler will take over receiving status updates
    from the job.  To an external scheduler which is submitting
    WSGRID proxy jobs, it sees the job as ended abnormally since it
    is monitoring the status of the proxy job.  Although the job may
    complete successfully on the batch endpoint, other steps are
    required to determine and surface the status of the job to any
    external scheduling tools.
    If the WSGRID job could remain up and continue to receive status
    messages, additional steps to get the jobs back in sync can be
    avoided.
    

Problem conclusion

  • A code update has been made in both the WebSphere for z/OS
    WSGRID native client and the Java Batch runtime to allow the
    take over of status messaging back to the WSGRID native client.
    
    Since the original Job Scheduler is gone, and thus the MDB which
    was monitoring the job, the messaging back to the client after a
    takeover is done with a new code path.  Job log streaming back
    to the client is not done with this new code path, and a
    messaging indicating such is issued.  The job logs still exist
    on the file system, and can be retrieved via other means.
    
    New properties on the WSGRID native client side have been added:
    
    endJobWhenSchedulerEnds=<true/false> (default true): This
    property controls whether the WSGRID proxy job should end if it
    is determined the Job Scheduler is no longer active.
    
    takeover-timeout=<timeout_in_milliseconds>: Specifies how long
    the proxy job should wait to receive a message that another
    scheduler has taken over the job before timing out and ending
    abnormally as it would have before.
    
    useTakeoverReturnCodes=<true/false> (default false): Indicates
    whether the proxy job should use special return codes for
    completed, restartable or failed jobs after a takeover.  This
    allows for differentiation between, for example, a job that
    completed and a job that was taken over and then completed.
    
    On the Java Batch Job Scheduler side, a new Scheduler custom
    property was added:
    com.ibm.websphere.batch.allow.wsgrid.takeover=<true/false>
    (Default false): Indicates that if a job submitted initially by
    WSGRID is taken over by another active Job Scheduler, it should
    take over sending messages back to the WSGRID client.
    
    
    The WebSphere for z/OS documentation for WSGRID and custom
    properties will be updated with this new information.
    
    The fix for this APAR is targeted for inclusion in fix pack
    9.0.5.12 and 8.5.5.22. For more information, see 'Recommended
    Updates for WebSphere Application Server':
    https://www.ibm.com/support/pages/node/715553
    

Temporary fix

Comments

APAR Information

  • APAR number

    PH36899

  • Reported component name

    WEBSPHERE FOR Z

  • Reported component ID

    5655I3500

  • Reported release

    900

  • Status

    CLOSED PER

  • PE

    NoPE

  • HIPER

    NoHIPER

  • Special Attention

    NoSpecatt / Xsystem

  • Submitted date

    2021-05-03

  • Closed date

    2022-03-23

  • Last modified date

    2022-03-23

  • APAR is sysrouted FROM one or more of the following:

  • APAR is sysrouted TO one or more of the following:

Fix information

  • Fixed component name

    WEBSPHERE FOR Z

  • Fixed component ID

    5655I3500

Applicable component levels

[{"Line of Business":{"code":"LOB45","label":"Automation"},"Business Unit":{"code":"BU059","label":"IBM Software w\/o TPS"},"Product":{"code":"SS7K4U","label":"WebSphere Application Server for z\/OS"},"Platform":[{"code":"PF025","label":"Platform Independent"}],"Version":"900"}]

Document Information

Modified date:
24 March 2022