IBM Support

IT30729: IBM SPECTRUM PROTECT PLUS REPLICATION FAILS WITH "TRANSFER FAILED: [ERRNO 32] BROKEN PIPE" DUE TO A NETWORK TIME-OUT

Subscribe

You can track all active APARs for this component.

 

APAR status

  • Closed as program error.

Error description

  • Error Description:
    
    Ibm Spectrum Protect Plus replication fails with "TRANSFER
    FAILED: [ERRNO 32] BROKEN PIPE" due to a network time-out
    
    This defect is similar to Apar IT29759 except
    that replication job  was not cancelled.
    
    
    In the replication job log you will see errors similar to the
    following:
    
    [2019-10-16 17:20:24,165] ERROR pid-27501 vsnap.repld Session
    41: worker failed: Transfer failed: Disconnected from partner
    ssrebrvsnp901: [Errno 32] Broken pipe
    
    [2019-10-16 17:20:24,175] INFO pid-27501
    vsnap.replication.session Session 41: status = FAILED
    
    [2019-10-16 17:20:24,180] INFO pid-27501
    vsnap.replication.config Relationship
    fb256ebdbfa36c3d68b2ee6672861711: last sync status = FAILED
    
    On the problem Vsnap system, you may see "zfs recv" processes
    running that did not get cleaned up due to a network time-out.
    
    
    
    Spectrum Protect Plus versions Affected: 10.1.4
    
    Customer/L2 Diagnostics (If Applicable)
    
    Initial Impact:                    HIGH
    
    Additional Keywords:  zfs recv time-out Sppsup-1168 timeout
    
    TS002806393
    

Local fix

  • Ensure that no replication jobs are running.
    
    Run "sudo pkill -f 'zfs recv'" on both source and target
    replication vSnaps
    

Problem summary

  • ****************************************************************
    * USERS AFFECTED:                                              *
    * IBM Spectrum Protect Plus level 10.1.4.                      *
    ****************************************************************
    * PROBLEM DESCRIPTION:                                         *
    * See Error Description                                        *
    ****************************************************************
    * RECOMMENDATION:                                              *
    * Apply fixing level when available. This problem is currently *
    * projected to be fixed in IBM Spectrum Protect Plus level     *
    * 10.1.4.277 and 10.1.5. Note that this is subject to change   *
    * at the discretion of IBM.                                    *
    ****************************************************************
    

Problem conclusion

  • When a vSnap replication session failed, under certain
    conditions the data transfer pipe was not gracefully closed.
    This resulted in some hung processes being left behind on the
    replication target vSnap. During subsequent replication
    attempts, this would result in a "broken pipe" error.
    
    This problem was previously seen in APAR IT29759. Under that
    APAR, some fixes were previously made to ensure that the
    transfer pipes were closed gracefully. But the fixes were
    incomplete because they only addressed replication cancellation
    plus certain failure conditions.
    
    There are other failure conditions, particularly network
    disconnections, where the previous fixes did not take effect. As
    a result the "broken pipe" errors were still seen.
    
    These remaining problems have been resolved under the current
    APAR. At the start of each replication session, the primary
    vSnap now checks the target vSnap and looks for any leftover
    processes with open pipes that may have been left behind by
    previous failed sessions. These leftover processes are then
    automatically terminated before proceeding with the new
    replication session.
    

Temporary fix

Comments

APAR Information

  • APAR number

    IT30729

  • Reported component name

    SP PLUS

  • Reported component ID

    5737SPLUS

  • Reported release

    A14

  • Status

    CLOSED PER

  • PE

    NoPE

  • HIPER

    NoHIPER

  • Special Attention

    NoSpecatt / Xsystem

  • Submitted date

    2019-10-29

  • Closed date

    2019-11-13

  • Last modified date

    2019-12-04

  • APAR is sysrouted FROM one or more of the following:

  • APAR is sysrouted TO one or more of the following:

Fix information

  • Fixed component name

    SP PLUS

  • Fixed component ID

    5737SPLUS

Applicable component levels

[{"Business Unit":{"code":"BU058","label":"IBM Infrastructure w\/TPS"},"Product":{"code":"SSNQFQ","label":"IBM Spectrum Protect Plus"},"Platform":[{"code":"PF025","label":"Platform Independent"}],"Version":"A14","Line of Business":{"code":"LOB26","label":"Storage"}}]

Document Information

Modified date:
30 January 2024