IBM Support

IT37227: VSNAP COPY TO CLOUD JOB APPARENT HANG

Subscribe

You can track all active APARs for this component.

 

APAR status

  • Closed as program error.

Error description

  • A copy to cloud job may appear to hang in IBM Spectrum Protect
    Plus.
    A cancel by the administrator of the copy job will also have no
    effect.
    Only a reboot of the vSnap host will abort the job.
    
    The issue was seen for a copy to a repository IBM Spectrum
    Protect Server.
    
    In the job log the copy starts :
    
    SUMMARY,<timestamp>,CTGGA2399,Starting job for policy
                                  SPPOFFLOAD with job name
                                  <SLAName> (ID:<SLAId>). id ->
                                  <JobId>. IBM Spectrum Protect
                                  Plus version 10.1.8-4082.
    ...
       INFO,<timestamp>,CTGGA1913,Created sessionId <xxxxxx> for
                                  <vSnapVolumeName>
    ...
       INFO,<timestamp>,CTGGA3118,Copying snapshot (ID:
                                  <SnapshotId>) from source [
                                  server: <vSnapHost>  volume:
                                  <vSnapVolumeName>  snapshot:
                                  <SnapshotName>] to target [
                                  server: https://
                                  <ObjectAgentHost>:9000 volume:
                                  <TargetVolumeName>].
    
    then, no progress is seen, the transfer message is stuck to a
    certain value and is repeated indefinitely every 5 minutes :
    
      INFO,<timestamp>,CTGGA0365,Snapshot <SnapshotName>(Id:
                                 <SnapshotId>) volume <vSnapHost>:
                                 <vSnapVolumeName> has transferred
                                 <aa> GB (Last status: Transferred
                                 <aa> of <bb> GB; <cc>% complete;
                                 Average throughput <dd> MB/s)
    
    If the administrator attempts to cancel the job, the job log
    will report it received the command, but nothing will further
    happen :
    
      INFO,<timestamp>,CTGGA0360,Aborting replication data transfer
    
    If the administrator then reboots the vSnap host, the job will
    complete reporting an error :
    
      ERROR,<timestamp>,,Unable to monitor status. Errornull
      ERROR,<timestamp>,,Failed to replicate <SnapshotId>. Error
                         null e=java.lang.NullPointerException
    
    On the vSnap server, running "ps aux | grep recv" shows that
    there is one or more 'zfs recv' processes whose status is
    listed as "D" which means the process is hung waiting for I/O
    to complete.
    
    When a copy job has multiple transfer sessions, the above can
    be seen for only for some of the transfers while others can
    successfully complete.
    In that case the copy job ends in status PARTIAL.
    
    This issue occurs when the network connectivity between the
    vSnap and the cloud/repository has failed, or when the
    cloud/repository server cannot respond to vSnap requests.
    
    IBM Spectrum Protect Plus Versions Affected:
    IBM Spectrum Protect Plus 10.1.5 and later
    
    Initial Impact: Medium
    
    Additional Keywords: SPP, SPPLUS, TS005700623, hang, offload,
                         IT32806
    

Local fix

  • Reboot the vSnap host
    

Problem summary

  • ****************************************************************
    * USERS AFFECTED:                                              *
    * IBM Spectrum Protect Plus 10.1.8                             *
    ****************************************************************
    * PROBLEM DESCRIPTION:                                         *
    * See Error Description                                        *
    ****************************************************************
    * RECOMMENDATION:                                              *
    * Apply fixing level when available. This problem is currently *
    * projected to be fixed in IBM Spectrum Protect Plus level     *
    * 10.1.9. Note that this is subject to change at the           *
    * discretion of IBM.                                           *
    ****************************************************************
    

Problem conclusion

  • Implemented code fixes to correctly discard in-flight I/O
    operations when forcibly exporting a cloud pool. This allows the
    cloud copy operation to gracefully abort without causing the
    vSnap server to hang or crash and without leaving behind a stuck
    cloud pool.
    

Temporary fix

Comments

APAR Information

  • APAR number

    IT37227

  • Reported component name

    SP PLUS

  • Reported component ID

    5737SPLUS

  • Reported release

    A18

  • Status

    CLOSED PER

  • PE

    NoPE

  • HIPER

    NoHIPER

  • Special Attention

    NoSpecatt / Xsystem

  • Submitted date

    2021-06-11

  • Closed date

    2021-11-09

  • Last modified date

    2021-11-09

  • APAR is sysrouted FROM one or more of the following:

  • APAR is sysrouted TO one or more of the following:

Modules/Macros

  • vSnap    Offload
    

Fix information

  • Fixed component name

    SP PLUS

  • Fixed component ID

    5737SPLUS

Applicable component levels

[{"Business Unit":{"code":"BU058","label":"IBM Infrastructure w\/TPS"},"Product":{"code":"SSNQFQ","label":"IBM Spectrum Protect Plus"},"Platform":[{"code":"PF025","label":"Platform Independent"}],"Version":"A18","Line of Business":{"code":"LOB26","label":"Storage"}}]

Document Information

Modified date:
31 January 2024