IBM Support

PI57100: Remote partition wrongly ends in COMPLETED state when job is sto pped, wrongly bypassing partition execution on restart.

Subscribe

You can track all active APARs for this component.

 

APAR status

  • Closed as program error.

Error description

  • When a job with a partitioned step is stopped in the middle
    of executing a partitioned step, a remote partition (running
    on a separate server than the top-level job is executing on)
    may wrongly end with a BatchStatus of COMPLETED, rather than
    the correct status of STOPPED.   Although the partitioned
    step as a whole may re-execute on a restart of the job, the
    individual partition will not re-execute, since it is
    detected as already complete.   The business logic will not
    execute for this partition and the partition analyzer will
    not receive a call to analyzeStatus() (which it would have
    received on completion).   Note that it is possible that due
    to various circumstances such as the reason for which the
    job is being stopped, it is possible that for a given
    partitioned step,  some number, 0..M of the partitions will
    hit this problem while some number, 0..N do not.
    

Local fix

Problem summary

  • ****************************************************************
    * USERS AFFECTED:  All users of IBM WebSphere Application      *
    *                  Server Liberty Profile- Batch               *
    ****************************************************************
    * PROBLEM DESCRIPTION: When a job is stopped some partitions   *
    *                      are wrongly bypassing execution on      *
    *                      restart.                                *
    ****************************************************************
    * RECOMMENDATION:                                              *
    ****************************************************************
    When a job with a partitioned step is stopped in the middle of
    running that partitioned step:
    
    Case 1.)  A remote partition (running on a separate server than
    the top-level job is executing on) may wrongly end with a
    BatchStatus of COMPLETED, rather than the correct status of
    STOPPED.   Although the partitioned step as a whole may re-run
    on a restart of the job, the individual partition will not re-
    run, since it is detected as already complete.   The business
    logic will not run for this partition and the partition analyzer
    will not receive a call to analyzeStatus() (which it would have
    received on completion).   Note that due to various
    circumstances such as the reason for which the job is being
    stopped, it is possible that for a given partitioned step,  some
    number, 0..M of the partitions will hit this problem while some
    number, 0..N do not.
    
    Case 2.) If at least one partition has a BatchStatus and another
    partition in that step does not (it has not been started yet)
    prior to the job being stopped, a restart of that job will not
    perform properly. In this case when a restart of the job is
    performed only the partitions that have a BatchStatus that is
    not COMPLETED will be run (the partitions that have no
    BatchStatus are skipped). This is due to the fact that a
    partition's information only gets persisted in the database once
    it reaches a BatchStatus of STARTED. Note that if none of the
    partitions have a BatchStatus then the job will restart properly
    and all the partitions will run as desired.
    

Problem conclusion

  • Case 1 has been fixed by adding a new change that sets the step-
    level status to STOPPING (ultimately STOPPED) so it doesn't
    wrongly leave a partition with a BatchStatus of COMPLETED. This
    ensures that if a STOP command was done against a top-level job
    running remotely the remote partitions would be set to STOPPED
    as well.
    
    Case 2 has been fixed by querying the database only for
    partitions that have a BatchStatus of COMPLETED instead of
    looking for the partitions that do not. The values in the list
    of completed partitions are then removed from the full list of
    partitions (the list that would be used if none of the
    partitions had been started yet). That list is used to perform a
    proper restart and ensure that all of the partitions that should
    be run are.
    
    The fix for this APAR is currently targeted for inclusion in fix
    pack 8.5.5.9.  Please refer to the Recommended Updates page for
    delivery information:
    http://www.ibm.com/support/docview.wss?rs=180&uid=swg27004980
    

Temporary fix

Comments

APAR Information

  • APAR number

    PI57100

  • Reported component name

    WAS LIBERTY COR

  • Reported component ID

    5725L2900

  • Reported release

    855

  • Status

    CLOSED PER

  • PE

    NoPE

  • HIPER

    NoHIPER

  • Special Attention

    NoSpecatt / Xsystem

  • Submitted date

    2016-02-11

  • Closed date

    2016-02-12

  • Last modified date

    2016-02-12

  • APAR is sysrouted FROM one or more of the following:

  • APAR is sysrouted TO one or more of the following:

Fix information

  • Fixed component name

    WAS LIBERTY COR

  • Fixed component ID

    5725L2900

Applicable component levels

  • R855 PSY

       UP

[{"Line of Business":{"code":"LOB36","label":"IBM Automation"},"Business Unit":{"code":"BU053","label":"Cloud \u0026 Data Platform"},"Product":{"code":"SSD28V","label":"WebSphere Application Server Liberty Core"},"Platform":[{"code":"PF025","label":"Platform Independent"}],"Version":"855"}]

Document Information

Modified date:
06 September 2021