APAR status
Closed as program error.
Error description
IBM Spectrum Protect Plus agents for Db2 and MongoDB do not verify that snapshots created during a backup are still active and mounted before starting to copy data from them. The backup is reported as successful when it is actually incomplete. As a result, the database cannot be restored. Created snapshots for a large or high-load database can become 'INACTIVE' for Linux LVM or 'INVALID' for AIX JFS2 and are silently unmounted from their mount points during backup because the snapshots are not large enough to hold all of the changes on the source logical volumes or JFS2 file systems. How to check for messages related to the failed backup in the IBM Spectrum Protect Plus logs that can confirm this issue: 1. Generate a job log for the backup that is being checked. 2. Extract the command.log file from this location: <Job_Name>/application/<Job_Number>/<Job_GUID>/<Agent_IP_addr ess>/ 3. Search the log for all occurrences of 'not mounted', without quotes, and make note of each of the Snapshot Volume Names. Example: umount: /tmp/<Snapshot_Volume_Name>: not mounted 4. If the 'not mounted' message is NOT seen the issue was not encountered and there is no need to proceed further. 5. Otherwise, choose one Snapshot Volume Name and search for 'MainThread _backup_data: src path: //tmp/<Snapshot_Volume_Name>', without quotes. Example: [YYYY-MM-DD HH:MM:SS] DEBUG pid:NNNN MainThread _backup_data: src path: //tmp/<Snapshot_Volume_Name>/NODE0000/sqldbdir 6. Make note of the PID. If you follow the PID down the logs, you may see messages like this: DEBUG pid:NNNN MainThread _backup_data: src path: //tmp/<Snapshot_Volume_Name>/NODE0000/sqldbdir DEBUG pid:NNNN MainThread _backup_data: dest path: /mnt/spp/v snap/vpool1/fsX/AA_BB_CC_DD/<DB>/db2/<DB>/<DB_Name>/NODE0000/sq ldbdir DEBUG pid:NNNN MainThread _backup_data: sign path: /mnt/spp/v snap/vpool1/fsX/AA_BB_CC_DD/<DB>/signature/db2/<DB>/<DB_Name>/N ODE0000/sqldbdir DEBUG pid:NNNN MainThread incremental_copy: Number of worker processes: 4 DEBUG pid:NNNN MainThread incremental_copy: Read existing file catalog with 0 records DEBUG pid:NNNN MainThread incremental_copy: Obsolete files to wipe: 0 DEBUG pid:NNNN MainThread incremental_copy: Writing new catalog with 0 records DEBUG pid:NNNN MainThread incremental_copy: Total size processed: 0B DEBUG pid:NNNN MainThread incremental_copy: Effectively copied data amount: 0B JOBLOG pid:NNNN MainThread joblog: <CTGGH0006> Time elapsed: 0.15 seconds JOBLOG pid:NNNN MainThread joblog: <CTGGH0002> Data transferred in the backup operation: 0.0 MB JOBLOG pid:NNNN MainThread joblog: <CTGGH0003> Copied 0 files successfully for partition 0 7. If all three of these messages - 'not mounted', 'Total size processed: 0B', and '<CTGGH0003> Copied 0 files successfully' are seen for the same Snapshot Volume Name, the issue was encountered and the backup should be considered to have failed. 8. Repeat steps 5 - 7 for the each unique Snapshot Volume Name found in Step 3. Versions affected: 10.1.x
Local fix
To fix the issue, configure parameters in the /etc/guestapps.conf and possibly add storage space to volume groups containing database data. These parameters in the guestapps.conf file allow customization of the size of the snapshots created by IBM Spectrum Protect Plus agents: Db2MinimumFreeSpaceInPercent ? default value is 10 %; Db2MaximumAllocationInPercent ? default value is 25 %; Db2MinimumSnapshotVolumeSize ? default value is 50 MB. guestapps.conf example: [DEFAULT] Db2MinimumSnapshotVolumeSize = 250 Db2MinimumFreespaceInPercent = 50 Db2MaximumAllocationInPercent = 100 The following steps can be taken to manually test the number of changes for each of the Snapshot Volume Names that were previously identified: 1. Create a snapshot. The size of the snapshot should be bigger than 25% of the source logical volume size: For Linux LVM: 'lvcreate -s -n <Snapshot_Name> -L <Snapshot_Size> <Source_LV>' For AIX JFS2: 'snapshot -o snapfrom=<SRC_fs_path> -o size=<snapshot_size> M' 2. Monitor its status for a period of time equal to the duration of the database backup by periodically executing the following commands: For Linux LVM: 'lvs <Snapshot_Name>' or 'lvdisplay <Snapshot_Name>' For AIX JFS2: 'snapshot -q <SRC_fs_path>' 'Good' case example: Linux lvdisplay /dev/SPPlog0vg/snap-test0 ? Logical volume ? LV Path /dev/SPPlog0vg/snap-test0 LV Name snap-test0 VG Name SPPlog0vg LV UUID pkKCh1-C5mm-oCs8-JMFc-kTiY-FOtE-r3isw8 LV Write Access read/write LV Creation host, time floridaprod1, 2021-10-12 16:06:15 +0200 LV snapshot status active destination for lvSPPlog0 LV Status available open 0 LV Size 252.00 MiB Current LE 63 COW-table size 52.00 MiB COW-table LE 13 Allocated to snapshot 96.25% Snapshot chunk size 4.00 KiB Segments 1 Allocation inherit Read ahead sectors auto currently set to 8192 Block device 253:25 AIX snapshot -q /db2/SPN/log_dir/NODE0000 Snapshots for /db2/SPN/log_dir/NODE0000 Current Location 512-blocks Free Time * /dev/fslv00 65536 64768 Wed Oct 13 19:41:19 CEST 2021 ++ 'Bad' case example: Linux lvdisplay /dev/SPPlog0vg/snap-test0 ? Logical volume ? LV Path /dev/SPPlog0vg/snap-test0 LV Name snap-test0 VG Name SPPlog0vg LV UUID pkKCh1-C5mm-oCs8-JMFc-kTiY-FOtE-r3isw8 LV Write Access read/write LV Creation host, time floridaprod1, 2021-10-12 16:06:15 +0200 LV snapshot status INACTIVE destination for lvSPPlog0 LV Status available open 0 LV Size 252.00 MiB Current LE 63 COW-table size 52.00 MiB COW-table LE 13 Snapshot chunk size 4.00 KiB Segments 1 Allocation inherit Read ahead sectors auto currently set to 8192 Block device 253:25 AIX snapshot -q /db2/SPN/log_dir/NODE0000 Snapshots for /db2/SPN/log_dir/NODE0000 Current Location 512-blocks Free Time INVALID /dev/fslv00 65536 Wed Oct 13 19:41:19 CEST 2021 3. If the snapshot becomes 100% full and 'INACTIVE' (for LVM) or 'INVALID' (for JFS2) at any time during the testing period, then its size (<Snapshot_Size>) should be increased and the test should be done again after manually deleting any snapshots that were previously created. To remove a snapshot use: For Linux LVM: 'lvremove -f <Snapshot_Name> For AIX JFS2: 'snapshot -d <snapshot_LV>' 4. Once a size that doesn't result in a failure is found, then the Db2MaximumAllocationInPercent setting in the SPP DB2 agent's /etc/guestapps.conf file should be updated with the new value. The new value for the Db2MaximumAllocationInPercent can be calculated as: snapshot_size / sourceLV_size * 100 if a volume group holding the source logical volume (or JFS2 file system) has a 'free' to 'used' space ratio bigger than this value. Otherwise, the IBM Spectrum Protect Plus agent for Db2 (or MongoDB) selects the minimum from the calculated snapshot size and available free space. 5. If there are several identified snapshot volumes, then the maximum value from all calculated values for Db2MaximumAllocationInPercent should be used. The most reliable snapshot size that gives 100% guarantee that the snapshot will not become full and 'invalid' is equal to the size of the source logical volume (file system). To set it, set Db2MaximumAllocationInPercent to 100% and, if needed, add some free space to a volume group containing the source logical volume. It is sometimes useful to set a lower limit for a snapshot size by setting the parameter Db2MinimumSnapshotVolumeSize parameter (default value is 50 MB). In this case, the IBM Spectrum Protect Plus agent for Db2 (or MongoDB) will create snapshots with a size not less than this value. For example, if the calculated size is 52 MB and Db2MinimumSnapshotVolumeSize = 1024 then the resulting size will be 1024 MB. Another useful parameter is Db2MinimumFreeSpaceInPercent (default value is 10%) which prevents starting a backup when free space on at least one volume group containing logical volumes that should be backed up is less than Db2MinimumFreeSpaceInPercent of used space on the volume group. For example, the agent will not start a backup if Db2MinimumFreeSpaceInPercent = 10 and the Volume Group contains < 10 GB of free space when used space on the Volume Group is 100 GB.
Problem summary
**************************************************************** * USERS AFFECTED: * * IBM Spectrum Protect Plus levels 10.1.2, 10.1.3, 10.1.4, * * 10.1.5, 10.1.6, 10.1.7, 10.1.8 protecting Db2 or MongoDB * **************************************************************** * PROBLEM DESCRIPTION: * * See ERROR description * **************************************************************** * RECOMMENDATION: * * Apply fixing level when available. This problem is currently * * projected to be fixed in the 10.1.9. Note that this * * information is subject to change at the discretion of IBM. * ****************************************************************
Problem conclusion
The IBM Spectrum Protect Plus Db2 and MongoDB agents have been fixed to correctly detect snapshot states when they are unavailable for backup operations and issue appropriate error messages to report backup failures. When the backup operation for Db2 or MongoDB database fails with an error, the user can find the instructions to resolve the issue in the IBM Spectrum Protect Plus 10.1.9 Documentation for various operating systems (Linux/AIX). These procedures have been added to the IBM documentation to ensure successful backup operations for Db2 and MongoDB databases. For information, refer to Troubleshooting failed backup operations for large Db2 and MongoDB databases. IBM Docs URL: https://www.ibm.com/docs/en/spp/10.1.9?topic=troubleshooting-fai led-backup-operations-large-db2-mongodb-databases
Temporary fix
Comments
APAR Information
APAR number
IT38758
Reported component name
SP PLUS
Reported component ID
5737SPLUS
Reported release
A16
Status
CLOSED PER
PE
NoPE
HIPER
NoHIPER
Special Attention
NoSpecatt / Xsystem
Submitted date
2021-10-20
Closed date
2021-12-09
Last modified date
2021-12-13
APAR is sysrouted FROM one or more of the following:
APAR is sysrouted TO one or more of the following:
Modules/Macros
Apps Db2 MongoDB
Fix information
Fixed component name
SP PLUS
Fixed component ID
5737SPLUS
Applicable component levels
[{"Business Unit":{"code":"BU058","label":"IBM Infrastructure w\/TPS"},"Product":{"code":"SSNQFQ","label":"IBM Spectrum Protect Plus"},"Platform":[{"code":"PF025","label":"Platform Independent"}],"Version":"A16","Line of Business":{"code":"LOB26","label":"Storage"}}]
Document Information
Modified date:
31 January 2024