IBM Support

JR61640: UNATTENDED CLUSTER FAILURE MIGHT CAUSE SUBSEQUENT CONNECTION FAILURES FROM PX CONDUCTOR NODE TO PXYARN CLIENT.

Subscribe

You can track all active APARs for this component.

 

APAR status

  • Closed as fixed if next.

Error description

  • A Hadoop  cluster failure can leave bad data in temporary files
    which will block YARN client connections even after subsequent
    reboots.
    Users will see connection failures from the PX conductor node to
     the PXYarn    client when the PXYarn client starts.
    
    
    There are two files which can cause this problem:
    
    
     /tmp/yarn_client_port.out or
    /tmp/yarn_client_port.$DSUSER.out
    
    Job logs show an error message like :
    ----
    Message Id: IIS-DSEE-TFPM-00464
    Message: main_program: Fatal Error: Could not get the address
    info to connect to the Application Master: Success.
    ----
    

Local fix

  • Remove from Conductor node the offending temporary file :
    ----
    /tmp/yarn_client_port.out
    ----
     or :
    ----
    /tmp/yarn_client_port.$DSUSER.out
    ----
    and restart YARN Client
    

Problem summary

  • A Hadoop  cluster failure can leave bad data in temporary files
    which will block YARN client connections even after subsequent
    reboots.
    Users will see connection failures from the PX conductor node to
    the PXYarn    client when the PXYarn client starts.
    There are two files which can cause this problem:
    /tmp/yarn_client_port.out
    or
    /tmp/yarn_client_port.$DSUSER.out
    

Problem conclusion

Temporary fix

  • /tmp/yarn_client_port.out
    or :
    /tmp/yarn_client_port.$DSUSER.out
    
    and restart YARN Client
    

Comments

APAR Information

  • APAR number

    JR61640

  • Reported component name

    WIS DATASTAGE

  • Reported component ID

    5724Q36DS

  • Reported release

    B71

  • Status

    CLOSED FIN

  • PE

    NoPE

  • HIPER

    NoHIPER

  • Special Attention

    NoSpecatt / Xsystem

  • Submitted date

    2019-10-24

  • Closed date

    2019-11-25

  • Last modified date

    2019-11-25

  • APAR is sysrouted FROM one or more of the following:

  • APAR is sysrouted TO one or more of the following:

Fix information

Applicable component levels

  • RB71 PSY

       UP

[{"Business Unit":{"code":"BU059","label":"IBM Software w\/o TPS"},"Product":{"code":"SSVSEF","label":"InfoSphere DataStage"},"Platform":[{"code":"PF025","label":"Platform Independent"}],"Version":"11.7","Line of Business":{"code":"LOB10","label":"Data and AI"}}]

Document Information

Modified date:
15 October 2021