IBM Support

IT37571: CTGGA0309 COPY FAILED FOR SNAPSHOT WITH 'TRANSFERERROR' WHILE CONNECTIVITY TO CLOUD IS GOOD

Subscribe

You can track all active APARs for this component.

 

APAR status

  • Closed as program error.

Error description

  • The IBM Spectrum Protect Plus copy to cloud job can stop after
    a few minutes even if connectivity is not slow.
    This is an example of the problem seen in the joblog with a
    copy to an IBM Spectrum Protect server :
    
    SUMMARY,<timestamp>,,CTGGA2398,Starting job for policy
                         <SLAName> (ID:<SLAId>). id -> <JobId>.
                         IBM Spectrum Protect Plus version
                         10.1.8-4083.
    ...
       INFO,<timestamp>,2,CTGGA3115,Copying 1/1 snapshot to object
                          storage for volume <vSnapHost>:
                          <VolumeName>
       INFO,<timestamp>,2,CTGGA3118,Copying snapshot (ID: 7647)
                          from source [server: <vSnapHost>
                          volume: <VolumeName>  snapshot:
                          <SnapshotName>] to target
                          [server: https://<ObjectAgentHost>:<port>
                          volume: <TargetVolume>].
       INFO,<timestamp>,2,CTGGA1913,Created sessionId <SessionId>
                          for <VolumeName>
    ...
      ERROR,<timestamp>,2,CTGGA0309,Copy failed for snapshot
                          (ID: 7647) from source [server:
                          <vSnapHost>  volume: <VolumeName>
                          snapshot: <SnapshotName>] to target
                          [server: https://<ObjectAgentHost>:<port>
                          volume: <TargetVolume>].
                          Error: TransferError: Transfer failed:
                          The data transfer was cancelled because
                          the upload of data to the target cloud
                          server or repository server did not make
                          progress. Ensure that the vSnap server
                          has adequate connectivity to the cloud
                          server or repository server.
                          If the network link is slow try setting a
                          lower rate limit for copy operations to
                          object storage under Advanced Options for
                          the vSnap server.
      ERROR,<timestamp>,2,CTGGA0310,Skipping remaining snapshots
                          for volume <vSnapHost>:<VolumeName> due
                          to unrecoverable error for vSnap session
                          <SessionId>
    
    In the vSnap replication log :
    
    [<timestamp>] INFO pid-12345 vsnap.common.model Session
                       <CloudCopySessionId> message = Starting data
                       transfer
    ...
    [<timestamp>] INFO pid-1234 vsnap.common.mvr Reading from pipe:
                       zfs send -c -i vpool1/fn123@snap4567 vpool1/
                       fn1234@snap4567 2>/tmp/<CloudCopySessionId>
    [<timestamp>] INFO pid-1234 vsnap.common.mvr   Writing to pipe:
                       zfs recv -u -F opool<PoolId>/fnoffload@snap
                       <SnapId>
    ...
    [<timestamp>] INFO pid-1234 vsnap.cloud.mover  Waiting because
                       the number of objects in temp dir (496) has
                       grown beyond the limit (462)
    [<timestamp>] INFO pid-1234 vsnap.common.mvr  Aborting because
                       the operation is stalled with message:
                       The data transfer was cancelled because the
                       upload of data to the target cloud server or
                       repository server did not make progress.
                       Ensure that the vSnap server has adequate
                       connectivity to the cloud server or
                       repository server.
                       If the network link is slow, try setting a
                       lower rate limit for copy operations to
                       object storage under Advanced Options for
                       the vSnap server.
    
    This is seen because of the presence of objects left over from
    previous failed offloads that should have been cleaned up and
    incorrectly being considered as active pending offload
    elements.
    In that case, the vSnap incorrectly considers connectivity is
    slow because of the unexpected extra workload.
    
    IBM Spectrum Protect Plus Versions Affected:
    IBM Spectrum Protect Plus 10.1.x
    
    Additional Keywords: SPP, SPPLUS, TS005867485, offload, stalled
    

Local fix

  • Local Fix:
    When no offload is active right after having rebooted the
    vSnap :
    1. List the orphaned elements with the vSnap command :
        sudo find /opt/vsnap-data -type d -name objects -print
    2. Delete these with :
        sudo find /opt/vsnap-data -type d -name objects -exec rm -rf
    {} \;
    

Problem summary

  • ****************************************************************
    * USERS AFFECTED:                                              *
    * IBM Spectrum Protect Plus level 10.1.7 and 10.1.8            *
    ****************************************************************
    * PROBLEM DESCRIPTION:                                         *
    * See Error Description                                        *
    ****************************************************************
    * RECOMMENDATION:                                              *
    * Apply fixing level when available. This problem is currently *
    * projected to be fixed in IBM Spectrum Protect Plus level     *
    * 10.1.9. Note that this is subject to change at the           *
    * discretion of IBM.                                           *
    ****************************************************************
    

Problem conclusion

  • Implemented code fixes to ensure cached objects left over from
    previous cloud copy sessions are cleaned up before starting a
    new session.
    

Temporary fix

Comments

APAR Information

  • APAR number

    IT37571

  • Reported component name

    SP PLUS

  • Reported component ID

    5737SPLUS

  • Reported release

    A18

  • Status

    CLOSED PER

  • PE

    NoPE

  • HIPER

    NoHIPER

  • Special Attention

    NoSpecatt / Xsystem

  • Submitted date

    2021-07-12

  • Closed date

    2021-09-29

  • Last modified date

    2021-09-29

  • APAR is sysrouted FROM one or more of the following:

  • APAR is sysrouted TO one or more of the following:

Modules/Macros

  • vSnap    Offload
    

Fix information

  • Fixed component name

    SP PLUS

  • Fixed component ID

    5737SPLUS

Applicable component levels

[{"Business Unit":{"code":"BU058","label":"IBM Infrastructure w\/TPS"},"Product":{"code":"SSNQFQ","label":"IBM Spectrum Protect Plus"},"Platform":[{"code":"PF025","label":"Platform Independent"}],"Version":"A18","Line of Business":{"code":"LOB26","label":"Storage"}}]

Document Information

Modified date:
31 January 2024