IBM Support

JR56417: Parallel jobs on Windows fail with APT_IOPort error 10054 when job is using TCP socket communication.

Subscribe

You can track all active APARs for this component.

 

APAR status

  • Closed as program error.

Error description

  • On Windows Parallel Jobs fail with error
    IIS-DSEE-TFOR-0001
    Fatal Error: APT_IOPort::readBlkVirt: read for block header,
    partition 0, [fd 5], returned -1 with errno 10,054 (Unknown
    Error)
    
    The failure occurs on stages like Join/Merge or any processing
    stage which take 2 or more inputs where:
    - The downsteam stage is waiting on Input 1 for processing.
    - The upstream stage on Input 2 completes sending  all data.
    After processing the current buffer on Input 1,  the downstream
    stage scans the other sockets and sees  the socket for Input 2
    has been closed and throws the  error.   The socket with the
    completed data is not kept open long enough for the other side
    to read it.
    
    The failure is most likely to occur when the number of records
    handled by the two inputs is very different.
    This failure was introduced into the parallel engine in IS 9.1
    and does not affect older releasese.
    

Local fix

  • This failure only happens when the parallel engine uses TCP
    sockets afor interprocess communication.  Users can switch to
    shared memory interprocess communication by defining
    environment variable:
     APT_NO_IOCOMM_OPTIMIZATION
    
    Unfortunately shared memory communication is much slower than
    socket communication.
    

Problem summary

  • ERROR DESCRIPTION:
    On Windows Parallel Jobs fail with error
    IIS-DSEE-TFOR-0001
    Fatal Error: APT_IOPort::readBlkVirt: read for block header,
    partition 0, [fd 5], returned -1 with errno 10,054 (Unknown
    Error)
    The failure occurs on stages like Join/Merge or any processing
    stage which take 2 or more inputs where:
    - The downsteam stage is waiting on Input 1 for processing.
    - The upstream stage on Input 2 completes sending  all data.
    After processing the current buffer on Input 1,  the downstream
    stage scans the other sockets and sees  the socket for Input 2
    has been closed and throws the  error.   The socket with the
    completed data is not kept open long enough for the other side
    to read it.
    The failure is most likely to occur when the number of records
    handled by the two inputs is very different.
    This failure was introduced into the parallel engine in IS 9.1
    and does not affect older releasese.
    LOCAL FIX:
    This failure only happens when the parallel engine uses TCP
    sockets afor interprocess communication.  Users can switch to
    shared memory interprocess communication by defining
    environment variable:
     APT_NO_IOCOMM_OPTIMIZATION
    Unfortunately shared memory communication is much slower than
    socket communication.
    

Problem conclusion

  • Patches are available for this APAR. These patches fix the
    problem by keeping the internal communication socket open long
    enough to allow jobs to shut down properly. This fix is enabled
    by setting  the environment variable APT_DEFER_OUTCUR_FD_CLOSE .
    As of July 19,2017 a revised patch for Information Serve 11.5
    makes this variable unnecessary - the sockets behavior is fixed
    by default. The fixed behavior will be the default for future
    versions. The variable APT_DEFER_OUTCUR_FD_CLOSE is deprecated.
    Setting APT_DEFER_OUTCUR_FD_CLOSE should have no effect.
    The old behavior (not keeping the sockets upen for a enough
    time) can be restored by setting the new environment variable
    APT_OUTCUR_FD_CLOSE. We do not recommend setting this variable.
    

Temporary fix

Comments

APAR Information

  • APAR number

    JR56417

  • Reported component name

    INFO SRVR PLATF

  • Reported component ID

    5724Q3612

  • Reported release

    B31

  • Status

    CLOSED PER

  • PE

    NoPE

  • HIPER

    NoHIPER

  • Special Attention

    NoSpecatt / Xsystem

  • Submitted date

    2016-08-01

  • Closed date

    2016-08-09

  • Last modified date

    2018-07-27

  • APAR is sysrouted FROM one or more of the following:

  • APAR is sysrouted TO one or more of the following:

Fix information

  • Fixed component name

    INFO SRVR PLATF

  • Fixed component ID

    5724Q3612

Applicable component levels

  • R912 PSY

       UP

  • RB30 PSY

       UP

  • RB31 PSY

       UP

  • RB50 PSY

       UP

[{"Business Unit":{"code":"BU059","label":"IBM Software w\/o TPS"},"Product":{"code":"SSZJPZ","label":"InfoSphere Information Server"},"Platform":[{"code":"PF025","label":"Platform Independent"}],"Version":"11.3","Line of Business":{"code":"LOB10","label":"Data and AI"}}]

Document Information

Modified date:
15 October 2021