IBM Support

IT36087: REPLICATION JOB FAILS WITH ERROR: CTGGA0583 UNABLE TO DETERMINE VALUE IN GETSNAPSHOTS

Subscribe

You can track all active APARs for this component.

 

APAR status

  • Closed as program error.

Error description

  • A replication job in IBM Spectrum Protect Plus can fail with the
    following error shown in the job log:
    
    CTGGA0583: Exception occurred in post processing for
    replication. Backup Error. Unable to determine value in
    getSnapshots
    
     The error is more likely to occur on large replication jobs for
    SLAs that contain a large number of protected resources (virtual
    machines, databases, etc.). The error can occur after the job
    has already been running for several hours.
    
    Further examination of the Virgo log associated with the job
    shows that the cause of the failure is an error returned from
    the vSnap server while trying to collect snapshot information:
    
     VSnap Call GET https://<vsnap>:8900/api/volume/<id>/snapshot
    time Taken 300160 ms
    reason :
    org.springframework.web.client.HttpServerErrorException: 500
    INTERNAL SERVER ERROR
    Status: 500
    {"error":{"message":"Failed to collect snapshot
    information","type":"SnapshotInfoError"}}
    
     In the vSnap logs, the failure is observed to be caused by a
    timeout of a "zfs list" command:
    
     ERROR pid-xxxxx vsnap.linux.system    Timed out (300 seconds)
    waiting for command to complete: zfs list -t snapshot -o
    name,guid
    
    The problem occurs when the vSnap is under heavy I/O load during
    a large replication job. SPP makes repeated API calls to vSnap
    to collect snapshot information. For each API call, the vSnap
    server tries to query snapshot information from the storage
    pool. When the pool is under heavy I/O load and when there are a
    large number of snapshots in the pool, it can take a long time
    to collect snapshot properties which leads to the timeout.
    

Local fix

  • - Modify SLA schedules to avoid overlap of multiple large jobs
    if possible.
    - Modify advanced options for the vSnap server and lower the
    value for option 'Concurrent stream limit for replication'. For
    example, lower it to 3 from the default value of 5.
    

Problem summary

  • ****************************************************************
    * USERS AFFECTED:                                              *
    * IBM Spectrum Protect Plus levels 10.1.6, 10.1.7, 10.1.8.     *
    ****************************************************************
    * PROBLEM DESCRIPTION:                                         *
    * See Error Description.                                       *
    ****************************************************************
    * RECOMMENDATION:                                              *
    * Apply the fixing level when available. This problem is       *
    * projected to be fixed in IBM Spectrum Protect Plus level     *
    * 10.1.8.ifix2 and 10.1.9. Note that this is subject to change *
    * at the discretion of IBM.                                    *
    ****************************************************************
    

Problem conclusion

  • An improved caching mechanism has been introduced on vSnap
    servers in an effort to minimize the amount of metadata that
    must be read from the storage pool. When IBM Spectrum Protect
    Plus makes a large number of repeated attempts to query snapshot
    information during replication jobs, responses are returned from
    the cache thus ensuring that the vSnap can respond quickly
    without having to repeatedly read the same information from the
    storage pool. Depending on the release, the caching mechanism
    can be disabled by default and can be manually enabled on vSnaps
    using command: vsnap system pref set --name
    resourceListCacheAutoInit --value true. The vSnap server must be
    restarted after enabling this setting.
    

Temporary fix

Comments

APAR Information

  • APAR number

    IT36087

  • Reported component name

    SP PLUS

  • Reported component ID

    5737SPLUS

  • Reported release

    A16

  • Status

    CLOSED PER

  • PE

    NoPE

  • HIPER

    NoHIPER

  • Special Attention

    NoSpecatt / Xsystem

  • Submitted date

    2021-03-02

  • Closed date

    2021-08-27

  • Last modified date

    2021-08-27

  • APAR is sysrouted FROM one or more of the following:

  • APAR is sysrouted TO one or more of the following:

Modules/Macros

  • vSnap    ZFS
    

Fix information

  • Fixed component name

    SP PLUS

  • Fixed component ID

    5737SPLUS

Applicable component levels

[{"Business Unit":{"code":"BU058","label":"IBM Infrastructure w\/TPS"},"Product":{"code":"SSNQFQ","label":"IBM Spectrum Protect Plus"},"Platform":[{"code":"PF025","label":"Platform Independent"}],"Version":"A16","Line of Business":{"code":"LOB26","label":"Storage"}}]

Document Information

Modified date:
31 January 2024