IBM Support

JR60633: Parallel engine jobs fail intermittently with "Temporary failure in name resolution" during job startup.

Subscribe

You can track all active APARs for this component.

 

APAR status

  • Closed as program error.

Error description

  • PX uses the getaddr() system call to set up connections among
    processes in a job. Even in a stable network, there is a chance
    that the getaddrinfo() system call may return the error
    "Temporary failure in name resolution"
    due to network packet  dropping. The fix for this issuem makes
    PX more forgiving in this case by having PX retry the system
    call after  a defined interval.
    

Local fix

Problem summary

  • Patches are available which fix the issue.
    

Problem conclusion

  • The fix for this issue  creates two new PX environment
    variables
    APT_ADDRINFO_RETRY and APT_ADDRINFO_RETRY_INTERVAL
     that allow users to define the number of retries, and the
    interval
    between each retry of getaddrinfo(). The interval is in
    seconds, e.g. 3 means
    sleep 3 seconds then retry the call.
    APT_ADDRINFO_RETRY
        If set and the value is a positive number, the player will
    try the number
        of getaddrinfo() calls when the call's return code is
    EAI_AGAIN, e.g.
        "Temporary failure in name resolution" error from the
    system. If the error
        persists after the defined number of retries, the job will
    abort and will log
        a new message with the information of how many times of
    retry, the interval,
        and the system error.
    APT_ADDRINFO_RETRY_INTERVAL
        Set it to a positive number if sleep(n) is desired between
    each retry.
        (n means n seconds.) This environment variable only works
    when
        APT_ADDRINFO_RETRY is defined.
    

Temporary fix

Comments

APAR Information

  • APAR number

    JR60633

  • Reported component name

    WIS DATASTAGE

  • Reported component ID

    5724Q36DS

  • Reported release

    910

  • Status

    CLOSED PER

  • PE

    NoPE

  • HIPER

    NoHIPER

  • Special Attention

    NoSpecatt / Xsystem

  • Submitted date

    2019-02-04

  • Closed date

    2019-02-21

  • Last modified date

    2019-02-21

  • APAR is sysrouted FROM one or more of the following:

  • APAR is sysrouted TO one or more of the following:

Modules/Macros

  • None
    PX
    

Fix information

  • Fixed component name

    WIS DATASTAGE

  • Fixed component ID

    5724Q36DS

Applicable component levels

  • R912 PSY

       UP

[{"Business Unit":{"code":"BU059","label":"IBM Software w\/o TPS"},"Product":{"code":"SSVSEF","label":"InfoSphere DataStage"},"Platform":[{"code":"PF025","label":"Platform Independent"}],"Version":"9.1","Line of Business":{"code":"LOB10","label":"Data and AI"}}]

Document Information

Modified date:
17 October 2021