APAR status
Closed as program error.
Error description
When IBM Spectrum Protect Plus is configured to copy data to IBM Cloud Object Storage (ICOS) vaults that has retention enabled (also referred to as immutable object storage or WORM), vSnap servers can make a large number of batch DELETE API requests to the ICOS endpoint in an attempt to delete objects that are locked by retention. The following symptoms can be seen when this problem is present. The vSnap server can show heavy CPU usage even when no jobs are active in IBM Spectrum Protec Plus. Examining the process activity on the vSnap using "top" shows a large number of python3 processes. Running "ps -ef" shows that the python3 processes are associated with the vsnap-maint service, for example: root 8366 30201 14 Apr07 ? 02:30:19 /opt/vsnap/venv/bin/python3 /opt/vsnap/lib/vsnap/service/maintenance/maint root 8377 30201 13 Apr07 ? 02:24:40 /opt/vsnap/venv/bin/python3 /opt/vsnap/lib/vsnap/service/maintenance/maint root 8388 30201 14 Apr07 ? 02:26:12 /opt/vsnap/venv/bin/python3 /opt/vsnap/lib/vsnap/service/maintenance/maint On the ICOS side, examination of the access logs show a large number of incoming DELETE requests, most of which fail with status code 451 which is reported when objects are locked by retention and cannot be deleted. A secondary symptom is that in some cases, the ICOS system can be overwhelmed by the large number of delete requests and this can result in other PUT or POST requests failing or timing out. If this occurs, copy jobs targeting that ICOS endpoint can fail with the following error seen in the job log: ERROR,CTGGA0309,Copy failed for snapshot <details> Error: TransferError: Transfer failed: Failed to upload object to <endpoint>. Reason: InternalError: We encountered an internal error. Please try again. status code: 500. The problems occur because there are certain metadata objects maintained by IBM Spectrum Protect Plus which are updated frequently. Since objects cannot be updated directly when they are locked, a new updated copy of the metadata is uploaded and the previous copy becomes a candidate for deletion. For non-metadata objects, Spectrum Protect Plus keeps track of the retention settings on the vault, and it ensures that it only expires data after the retention has passed. But for metadata objects that are frequently updated as part of routine copy operations, the vSnap server attempts to delete the older metadata as soon as a newer copy is uploaded. If the older copy is still locked by retention, ICOS rejects the deletion request, but the vSnap server keeps retrying it on a frequent basis. As the number of pending objects scales higher, especially on vaults that have a large retention value like several months, the vSnap server ends up with a large backlog of metadata objects that are pending for deletion. Even though ICOS keeps rejecting the deletion requests, the large number of calls results in heavy resource usage on both the vSnap server as well as the ICOS system. IBM Spectrum Protect Versions Affected: IBM Spectrum Protect Plus 10.1.3 and later.
Local fix
Problem summary
**************************************************************** * USERS AFFECTED: * * IBM Spectrum Protect Plus level 10.1.3, 10.1.4, 10.1.5, * * 10.1.6, 10.1.7 and 10.1.8 * **************************************************************** * PROBLEM DESCRIPTION: * * See Error Description * **************************************************************** * RECOMMENDATION: * * Apply fixing level when available. This problem is currently * * projected to be fixed in IBM Spectrum Protect Plus level * * 10.1.9. Note that this is subject to change at the * * discretion of IBM. * ****************************************************************
Problem conclusion
The vsnap-maint service has been enhanced to have better awareness of retention-enabled vaults in IBM Cloud Object Storage. When vsnap-maint detects that the vault has retention enabled, it will no longer make frequent attempts to delete metadata objects that are locked by retention. Instead, the service detects the retention value of the vault and then schedules the pending session to be retried only after the appropriate number of days has passed.
Temporary fix
Comments
APAR Information
APAR number
IT38311
Reported component name
SP PLUS
Reported component ID
5737SPLUS
Reported release
A18
Status
CLOSED PER
PE
NoPE
HIPER
NoHIPER
Special Attention
NoSpecatt / Xsystem
Submitted date
2021-09-09
Closed date
2021-09-29
Last modified date
2021-09-29
APAR is sysrouted FROM one or more of the following:
APAR is sysrouted TO one or more of the following:
Modules/Macros
vSnap Offload
Fix information
Fixed component name
SP PLUS
Fixed component ID
5737SPLUS
Applicable component levels
[{"Business Unit":{"code":"BU058","label":"IBM Infrastructure w\/TPS"},"Product":{"code":"SSNQFQ","label":"IBM Spectrum Protect Plus"},"Platform":[{"code":"PF025","label":"Platform Independent"}],"Version":"A18","Line of Business":{"code":"LOB26","label":"Storage"}}]
Document Information
Modified date:
31 January 2024