APAR status
Closed as program error.
Error description
When Oracle application backup failed on IBM Spectrum Protect Plus, we may see the following error from job log: CTGGF0190,..... Database xxxxxxx: No transfer was detected for over 900 seconds. Stopping because the operation appears to have stalled. CTGGA4092,......A job cancellation was requested while waiting for the command sudo -n -u 'oracle' /u01/app/oracle/product/19.0.0/db_home1/bin/rman @/tmp/626ca602- 0460-4ae0-8c5f-33e9ba413317/runner-input-Thread-74.sql to complete. CTGGF0031,......Databse xxxxxxx: Backup failed: Failed to perform data file backup The error indicates the oracle agent is timing out quickly (the default waiting time is 15 minutes) If we increase the waiting time to 2 hours, we may still see the following error : CTGGA4093,.....Timed out (300 seconds). Waiting for sudo -n mount -t nfs -o 'rw bg hard nointr rsize=32768 wsize=32768 tcp noac actimeo=0 vers=3 timeo=600' 100.100.160.15:/vsnap/vpool2/fs5 /mnt/spp/vsnap/vpool2/fs5/100_100_160_15 command to complete. CTGGA4030,.....Failed to mount share /mnt/spp/vsnap/vpool2/fs5/100_100_160_15. CTGGA2122,.....Enable log backup failed for database(s) on instance xxxxxxxxxxxxx :Failed to mount backup destination CTGGA4093,....Timed out (300 seconds). Waiting for sudo -n fuser -k -c /mnt/spp/vsnap/vpool2/fs4/100_100_160_15 command to complete. CTGGA4093,....Timed out (300 seconds). Waiting for sudo -n -u 'oracle' /u01/app/oracle/product/19/bin/rman @/tmp/90 a80a07-f63b-4519-a5ef-0ec6553d5959/runner-input-Thread-12.sql command to complete. CTGGF0065,.... Database xxxxxxx: Log purge failed: Failed to delete archived logs In some case, we may not see the Oracle agent time out and operation appears to have stalled error, and see the above errors directly. Users Affected: IBM Spectrum Protect Plus level 10.1.x.
Local fix
Problem summary
**************************************************************** * USERS AFFECTED: * * IBM Spectrum Protect Plus levels 10.1.x. * **************************************************************** * PROBLEM DESCRIPTION: * * See Error Description. * **************************************************************** * RECOMMENDATION: * * Apply the fixing level when available. This problem is * * currently projected to be fixed in IBM Spectrum Protect Plus * * level 10.1.7 ifix2 and 10.1.8. Note that this is subject to * * change at the discretion of IBM. * ****************************************************************
Problem conclusion
The root cause of this issue is a kernel bug in RedHat (https://access.redhat.com/solutions/3339001). A change was made in IBM Spectrum Protect Plus to recreate the NFS shares that are mounted on the Oracle servers immediately after a vSnap restart. This reduces the time period for which a share is not available on the Oracle Server and reduces the occurrence of the NFS hang issue.
Temporary fix
Comments
APAR Information
APAR number
IT35480
Reported component name
SP PLUS
Reported component ID
5737SPLUS
Reported release
A16
Status
CLOSED PER
PE
NoPE
HIPER
NoHIPER
Special Attention
NoSpecatt / Xsystem
Submitted date
2021-01-11
Closed date
2021-04-19
Last modified date
2021-04-19
APAR is sysrouted FROM one or more of the following:
APAR is sysrouted TO one or more of the following:
Fix information
Fixed component name
SP PLUS
Fixed component ID
5737SPLUS
Applicable component levels
[{"Business Unit":{"code":"BU058","label":"IBM Infrastructure w\/TPS"},"Product":{"code":"SSNQFQ","label":"IBM Spectrum Protect Plus"},"Platform":[{"code":"PF025","label":"Platform Independent"}],"Version":"A16","Line of Business":{"code":"LOB26","label":"Storage"}}]
Document Information
Modified date:
31 January 2024