IBM Support

JR60144: HADOOP EDGE NODE CAN CRASH WHEN A YARN CLIENT IS WAITING FOR RESOURCES WITH A LARGE NUMBER OF JOBS RUNNING CONCURRENTLY

Subscribe

You can track all active APARs for this component.

 

APAR status

  • Closed as suggestion for future release.

Error description

  • If the px yarn client  needs to be started as part of the job
    execution on a busy system (150 concurrent job starts), there
    can be   a situation that basically ends up taking down the
    edge node. There are several scenarios. For example
    if the conductor gives up on running a job, say If
    APT_YARN_MSG_TIMEOUT is hit, the job and conductor dies, the
    the PX YARN client doesn't does not give up waiting for the
    conductor to get an Application Master and the node is hung.
    Users may see logs like
    main_program: Fatal Error: Failed to read Application Master
    connection data from YARN Client. Check the health of
    YARN/Hadoop. Socket returned end of file. Look into YARN
    Client's logs for more information at
    /opt/IBM/cis_dev/InformationServer/Server/PXEngine/logs/yarn_log
    s/yarn_client.dsadm3a.out*
    

Local fix

  • Reduce the number of concurrent jobs
    

Problem summary

Problem conclusion

Temporary fix

Comments

  • This issue has been fixed in InforSphere Information Server
    11.7.1
    

APAR Information

  • APAR number

    JR60144

  • Reported component name

    WIS DATASTAGE

  • Reported component ID

    5724Q36DS

  • Reported release

    B70

  • Status

    CLOSED SUG

  • PE

    NoPE

  • HIPER

    NoHIPER

  • Special Attention

    NoSpecatt / Xsystem

  • Submitted date

    2018-10-15

  • Closed date

    2019-03-12

  • Last modified date

    2019-03-12

  • APAR is sysrouted FROM one or more of the following:

  • APAR is sysrouted TO one or more of the following:

Fix information

Applicable component levels

[{"Line of Business":{"code":"LOB10","label":"Data and AI"},"Business Unit":{"code":"BU059","label":"IBM Software w\/o TPS"},"Product":{"code":"SSVSEF","label":"InfoSphere DataStage"},"Platform":[{"code":"PF025","label":"Platform Independent"}],"Version":"11.7"}]

Document Information

Modified date:
02 September 2021