APAR status
Closed as program error.
Error description
The IBM Spectrum Protect Plus vADP host can become unresponsive during a VMware guest backup job. Guest backups can be stopped with the messages in the job log : ERROR,..,CTGGA3134,Failed to receive status updates from the backup process on VADP proxy (<ProxyAddress>) stopped the backup of VM (<VMName>). Investigation on the vADP host reveals that the affected guest backup sessions are killed by the Linux out-of-memory-monitor as seen in the /var/log/messages file. Even when there is no active backup, the 'ps -eF|grep vmdkbackup' command will display a lot of left over guest sessions like : UID PID PPID C SZ RSS PSR STIME TTY TIME ... root 12345 1234 1 670845 76940 7 00:06 ? 00:06:09 CMD /opt/IBM/SPP/bin/vmdkbackup -a /tmp/vmdkbackup-<JobId>-vm-<mob> -<xxx>.json that should have been cleaned up at backup completion. It is the accumulation of these sessions all holding some memory that lead to the memory shortage of the vADP. The commands 'mount' and 'ls -l /sppvadp' will display also a lot of left over mounted directories like : /sppvadp/sppvadp__vsnap_vpool<x>_fs<yy>_<IPAddress>_<JobID> Further investigation of the VMware VDDK log for guests having such a left over session shows that, at the end of the processing, a failure to unmount the vSnap directory used to store the backup data on the vSnap. ... 2023-01-02T20:07:42.746Z [I] The backup completed successfully. Total bytes transferred <xxx> MB in <y> second(s). Throughput: <zz> MB/s. ... 2023-01-02T20:07:42.767Z [I] Unmounting /sppvadp/sppvadp__vsnap_ vpool<x>_fs<yy>_<IP>_<JobId>... 2023-01-02T20:07:42.769Z [I] Command finished with status: (exit status 1), desc (/bin/umount: invalid option -- ' ' ... 2023-01-02T20:07:42.769Z [I] Removing directory /sppvadp/sppvadp __vsnap_vpool<x>_fs<yy>_<IP>_ <JobId>... 2023-01-02T20:07:42.769Z [E] Command to remove directory (/sppvadp/sppvadp__vsnap_vpool<x>_ fs<yy>_<IP>_<JobId>) failed: (remove /sppvadp/sppvadp__vsnap_ vpool<x>_fs<yy>_<IP>_<JobId>: device or resource busy) This only affects vADP hosts running Linux RHEL 7.x/CentOS 7 and Linux RHEL 8.5 or earlier/CentOS 8 vADP services running on supported RHEL 8.6 and later versions are not affected. IBM Spectrum Protect Plus Versions Affected: IBM Spectrum Protect Plus 10.1.12 and later Additional Keywords: SPP, SPPLUS, TS011633854, hang, memory, full
Local fix
Define new vADP hosts based on a supported RHEL 8.x version and remove the ones still running at the affected Linux versions. OR When there is no active backup session for the vADP : 1. Stop the vADP service with the command : sudo systemctl stop remote-vadp 2. If left over mount in the format /sppvadp/sppvadp__vsnap_vpool<x>_fs<yy>_<IP>_<JobId> are seen, unmount these one by one: /bin/umount -l -f /sppvadp/sppvadp__vsnap_vpool<x>_fs<yy> <IP>_<JobId> 3. If left over directories are seen, delete everything in directory /sppvadp 4. Start the vADP service : sudo systemctl stop remote-vadp
Problem summary
**************************************************************** * USERS AFFECTED: * * IBM Spectrum Protect Plus level 10.1.12 and 10.1.13 * **************************************************************** * PROBLEM DESCRIPTION: * * See Error Description * **************************************************************** * RECOMMENDATION: * * Apply the fixing level when available. This problem is * * currently projected to be fixed in IBM Spectrum Protect Plus * * levels 10.1.13.1 and 10.1.14. Note that this is subject to * * change at the discretion of IBM. * ****************************************************************
Problem conclusion
VADP was fixed so that it now work on CentOS, RHEL 7, and RHEL 8
Temporary fix
Comments
APAR Information
APAR number
IT42865
Reported component name
SP PLUS
Reported component ID
5737SPLUS
Reported release
A1C
Status
CLOSED PER
PE
NoPE
HIPER
NoHIPER
Special Attention
NoSpecatt / Xsystem
Submitted date
2023-01-09
Closed date
2023-01-30
Last modified date
2023-03-01
APAR is sysrouted FROM one or more of the following:
APAR is sysrouted TO one or more of the following:
Fix information
Fixed component name
SP PLUS
Fixed component ID
5737SPLUS
Applicable component levels
[{"Business Unit":{"code":"BU058","label":"IBM Infrastructure w\/TPS"},"Product":{"code":"SSNQFQ","label":"IBM Spectrum Protect Plus"},"Platform":[{"code":"PF025","label":"Platform Independent"}],"Version":"A1C","Line of Business":{"code":"LOB26","label":"Storage"}}]
Document Information
Modified date:
01 February 2024