APAR status
Closed as program error.
Error description
In rare cases when a IBM Spectrum Protect Plus backup job has a long duration, a vSnap volume might unexpectedly be deleted if the Maintenance job runs while the backup is still active. One of the actions done by the Maintenance job is to clean up expired snapshots according to the defined retention rules for the SLA. If the expiration process deletes the last remaining snapshot for a volume, then the Maintenance job will also delete that volume. If that volume deletion is done for a vSnap volume that is selected for the long running backup job, the following two types of errors can be seen for the guests that need their data to be stored on such a volume. 1. The backup job successfully completed the data sending phase to the vSnap volume and is waiting to create a snapshot for that volume to commit the data. This zfs volume snapshot, by design, is scheduled when the guest data sending phase is completed for all the guests selected for the backup at the end of the job. If the volume to snapshot is deleted by the Maintenance job before a snapshot can be created, the error seen in the job log will be : CTGGA0076,Unprotected VM: <VMName>. Last error: Unknown 2. The guest is actively sending data or still waiting in the queue to send the backup data but the target vSnap volume is deleted by the Maintenance job. In that case, the error seen will be : CTGGA0076,Unprotected VM: <VMName>. Last error: [Unable to update access for volume Object not found on Vsnap : 404 NOT FOUND] In the virgo log found in the Spectrum Protect Plus appliance log bundle covering the observed period, the following example will be seen : Job assigning VM to Volume: [<timestamp>] INFO .. <BackupJobID> volumeInfo : <vSnapHostID> .volume.<VolumeID> group <groupID> .. Returning target volume for VM: <VMName> .. destVolume <VolumeName> .. Vsnap Call https://<vSnap FQDN>:8900/api/volume/<VolumeID>/ path?path=<PathName>/<VMName>.vm-<VMMobID> method GET .. Generating key for vsnap storage folder on server <vSnap FQDN> volume null path null .. add destinationStorageVolumesInfo xxxxx ... [<timestamp>] INFO .. <BackupJobID> volumeInfo : <vSnapHostID>.volume.<VolumeID> group <groupID> Maintenance deleting the volume after the last remaining snapshot was expired : [<timestamp>] INFO .. <MaintenanceJobID> Vsnap Call https://<vSnap FQDN>:8900/api/snapshot/ <SnapshotID> method GET .. Expiring retention snapshot <SnapshotName> using from protectionInfo .. Expiring retention snapshot <SnapshotName> using catalog manager .. Expiring retention snapshot <SnapshotName> from storage controller <vSnap FQDN> for policy vmware_infra-daily .. Vsnap Call https://<vSnap FQDN>:8900/api/snapshot/ <SnapshotID> method DELETE .. Checking if volume can be deleted for snapshot <SnapshotName> .. Catalog volume size returned for policy vmware_infra-daily size 1 .. Checking volume <VolumeName> .. Storage Cache :::: Get Volume <vSnapHostID>:<VolumeID> .. Vsnap Call https://<vSnap FQDN>:8900/api/volume/<VolumeID>/ snapshot method GET .. Deleting volume <VolumeName> Id(<VolumeID>). No remaining snapshots found on volume for policy vmware_infra-daily .. Vsnap Call https://<vSnap FQDN>:8900/api/volume/<VolumeID> ?force=true method DELETE IBM Spectrum Protect Plus Versions Affected: IBM Spectrum Protect Plus 10.1.x Initial Impact: Medium Additional Keywords: SPP, SPPLUS, TS004357635, maintenance, not found, partial
Local fix
Ensure the defined retention period is long enough to prevent the last snapshot to be deleted before creating a new backup version. OR Avoid the Maintenance job to start during a long lasting backup job. Eventually pause it and release it after the backups are completed.
Problem summary
**************************************************************** * USERS AFFECTED: * * IBM Spectrum Protect Plus level 10.1.6 and 10.1.7 * **************************************************************** * PROBLEM DESCRIPTION: * * See ERROR DESCRIPTION * **************************************************************** * RECOMMENDATION: * * Apply fixing level when available. This problem is currently * * projected to be fixed in IBM Spectrum Protect Plus level * * 10.1.7 ifix2 and 10.1.8. Note that this is subject to change * * at the discretion of IBM * ****************************************************************
Problem conclusion
The issue is fixed by ensure that volume that is in use by backup will not be removed by maintenance even if there are no snapshots for that volume.
Temporary fix
Comments
APAR Information
APAR number
IT34674
Reported component name
SP PLUS
Reported component ID
5737SPLUS
Reported release
A16
Status
CLOSED PER
PE
NoPE
HIPER
NoHIPER
Special Attention
NoSpecatt / Xsystem
Submitted date
2020-10-26
Closed date
2021-02-11
Last modified date
2021-02-11
APAR is sysrouted FROM one or more of the following:
APAR is sysrouted TO one or more of the following:
Fix information
Fixed component name
SP PLUS
Fixed component ID
5737SPLUS
Applicable component levels
[{"Business Unit":{"code":"BU058","label":"IBM Infrastructure w\/TPS"},"Product":{"code":"SSNQFQ","label":"IBM Spectrum Protect Plus"},"Platform":[{"code":"PF025","label":"Platform Independent"}],"Version":"A16","Line of Business":{"code":"LOB26","label":"Storage"}}]
Document Information
Modified date:
31 January 2024