IBM Support

IT31282: COPY/ARCHIVE TO SPECTRUM PROTECT SERVER OR CLOUD SERVER FAILS

Subscribe

You can track all active APARs for this component.

 

APAR status

  • Closed as program error.

Error description

  • A copy or archive job to a Spectrum Protect server or cloud
    object storage server fails with the following error:
    
    ERROR,,ddmmyy,hh-mm-ss,2,CTGGA0309,Copy failed for snapshot (ID:
    39) from source [server: aaa.bbb.ccc.ddd volume:
    spp_1037_2166_16d67d06ff1__group0_95_ snapshot:
    spp_1037_2187_2_16e0e592756] to target [server: eee.fff.ggg.hhh
    volume: a5737436dc434289af6ee3672128ab26]. Error: TransferError:
    Transfer failed: Stalled
    
    This error can be seen when the offload speed between the source
    vSnap host and target Spectrum Protect Server is around 15 MB/s.
    The vSnap offloads about one 16MB object per second but the
    throttling logic fails to act efficiently and allows several 16
    MB objects to be added to the cache within the same one second
    period which causes the cache to fill up immediately and cause
    the code to stop filling the cache to have time to offload it.
    After this "cache filled up/transfer paused to empty the cache"
    sequence happens for more than 20 times (hard coded value), the
    offload is aborted because the write requests to the device are
    too slow.
    
    IBM Spectrum Protect Plus Versions Affected:
    IBM Spectrum Protect Plus 10.1.x
    
    Initial Impact: Medium
    
    Additional Keywords: SPP, SPPlus, TS002622033
    

Local fix

  • There are two possible work arounds:
    1. If no replication from the source vsnap to another vsnap is
    done, set the following to slow down the offload speed and avoid
    filling up the cache:
        - vsnap system pref set --name cloudOffloadRate --value
    67108864
       If replicating from this vsnap (and the network connection
    with that vsnap is faster than the connection to the SP server),
    this will also slow down the replication speed.
       If results are not as expected, reset the CloudOffloadRate
    back to its default setting using 'vsnap system pref clear
    --name cloudOffloadRate'
    2. Adjust the caching heuristics by running the following on the
    vsnap which will not affect replication but may not be as
    effective.
        - vsnap system pref set --name cloudThrottleObjectsRatio
    --value 0.5
        - vsnap system pref set --name cloudThrottleObjectsPoll
    --value 15
       These parameters can also be reset to their default values
    using following commands:
        - vsnap system pref clear --name cloudThrottleObjectsRatio
        - vsnap system pref clear --name cloudThrottleObjectsPoll
    

Problem summary

  • ****************************************************************
    * USERS AFFECTED:                                              *
    * IBM Spectrum Protect Plus level 10.1.4 and 10.1.5.           *
    ****************************************************************
    * PROBLEM DESCRIPTION:                                         *
    * See Error Description                                        *
    ****************************************************************
    * RECOMMENDATION:                                              *
    * Apply fixing level when available. This problem is currently *
    * projected to be fixed in IBM Spectrum Protect Plus levels    *
    * 10.1.5 patch1 and 10.1.6. Note that this is subject to       *
    * change at the discretion of IBM.                             *
    ****************************************************************
    

Problem conclusion

  • When copying data from vSnap to Spectrum Protect (SP) repository
    server, the data is initially written to a local cache area on
    the vSnap server and then uploaded to SP. If the network link
    between vSnap and SP is slow, the local cache area can fill up
    quickly. In this case vSnap throttles writes into the cache
    until some data is successfully uploaded and the cache usage
    drops down again.
    
    Due to a bug in the throttling logic, the transfer can remain
    stuck for an extended period which causes vSnap to think the
    transfer has been interrupted. This causes job failures.
    
    The problem has been resolved b fixing the throttling logic in
    vSnap to ensure it continues the data transfer in a correct
    manner.
    

Temporary fix

Comments

APAR Information

  • APAR number

    IT31282

  • Reported component name

    SP PLUS

  • Reported component ID

    5737SPLUS

  • Reported release

    A14

  • Status

    CLOSED PER

  • PE

    NoPE

  • HIPER

    NoHIPER

  • Special Attention

    NoSpecatt / Xsystem

  • Submitted date

    2020-01-07

  • Closed date

    2020-02-12

  • Last modified date

    2020-04-09

  • APAR is sysrouted FROM one or more of the following:

  • APAR is sysrouted TO one or more of the following:

Fix information

  • Fixed component name

    SP PLUS

  • Fixed component ID

    5737SPLUS

Applicable component levels

[{"Business Unit":{"code":"BU058","label":"IBM Infrastructure w\/TPS"},"Product":{"code":"SSNQFQ","label":"IBM Spectrum Protect Plus"},"Platform":[{"code":"PF025","label":"Platform Independent"}],"Version":"A14","Line of Business":{"code":"LOB26","label":"Storage"}}]

Document Information

Modified date:
30 January 2024