IBM Support

JR59083: PX HADOOP JOBS FAILED WHEN A CONTAINER EXPIRES BEFORE ALL CONTAINERS ARE ALLOCATED

Subscribe

You can track all active APARs for this component.

 

APAR status

  • Closed as program error.

Error description

  • Yarn jobs can fail when some of the jobs containers are
    allocated while  other container assignments  are waiting on
    Yarn's resources.  If it takes long time for the remaining
    containers to be allocated by Yarn, the already allocated
    contains can  hit a time out set by Yarn resource manager as
    the container is not set to be "running".
    
    In the DataStage log, users may see a message like:
    Message Id: IIS-DSEE-TFPM-00493
    
    In the YARN logs, there may be messages like:
    AM00074 SEVERE ApplicationMaster: Expiry of containers
    occurred, failing the job and releasing all allocated
    containers. (runOSH)
    

Local fix

  • N/A
    

Problem summary

  • Title:
     Big Integrate jobs  which request a large number of Hadoop
    containers  can fail when a container expires before all
    containers are allocated
    Description:
    Yarn jobs can fail when some of the job s  containers are
    allocated while  other container assignments  are waiting on
    Yarn's resources.  If it takes long time for the remaining
    containers to be allocated by Yarn, the already allocated
    contains can  hit a time out set by Yarn resource manager as
    the container is not set to be "running".
    In the DataStage log, users may see a message like:
    Message Id: IIS-DSEE-TFPM-00493
    
    In the YARN logs, there may be messages like:
    AM00074 SEVERE ApplicationMaster: Expiry of containers
    occurred, failing the job and releasing all allocated
    containers. (runOSH)
    

Problem conclusion

  • Patches are available to fix the problem
    

Temporary fix

Comments

APAR Information

  • APAR number

    JR59083

  • Reported component name

    WIS DATASTAGE

  • Reported component ID

    5724Q36DS

  • Reported release

    B50

  • Status

    CLOSED PER

  • PE

    NoPE

  • HIPER

    NoHIPER

  • Special Attention

    NoSpecatt / Xsystem

  • Submitted date

    2018-02-01

  • Closed date

    2018-04-03

  • Last modified date

    2018-04-03

  • APAR is sysrouted FROM one or more of the following:

  • APAR is sysrouted TO one or more of the following:

Modules/Macros

  • SERVER
    

Fix information

  • Fixed component name

    WIS DATASTAGE

  • Fixed component ID

    5724Q36DS

Applicable component levels

  • RB70 PSY

       UP

[{"Line of Business":{"code":"LOB10","label":"Data and AI"},"Business Unit":{"code":"BU059","label":"IBM Software w\/o TPS"},"Product":{"code":"SSVSEF","label":"InfoSphere DataStage"},"Platform":[{"code":"PF025","label":"Platform Independent"}],"Version":"11.5"}]

Document Information

Modified date:
02 September 2021