APAR status
Closed as program error.
Error description
During an IBM Spectrum Protect Plus copy to object storage (Offload) job, it can fail due to a timeout thus causing all subsequent offloads to fail. The below errors lead up to the timeout causing the hang situation. joblog: WARN,...,CTGGA1915,Error occurred on Storage Vsnap replication Please check Vsnap Storage logs. ERROR,...,CTGGA0309,Copy failed for snapshot (ID: 10141) from source [server: <FQN> volume: <VOLName> snapshot: <SNAPShot>] to target [server: <FQN> volume: <VOLName]. Error: Exception: Failed to create gateway device: Command timed out: vsnap_targetcli saveconfig ERROR,...,,Skipping remaining snapshots for volume <FQN>:<VOLName> due to unrecoverable error for Vsnap session <SessionID> repl.log (vSnap Replication Log): ERROR pid-11349 vsnap.repld Traceback (most recent call last): ... vsnap.common.errors.CommandTimeoutError: Command timed out: vsnap_targetcli /loopback/naa.50014050f9c3e206/luns delete lun3 ERROR pid-11349 vsnap.repld Session <SessionID>: worker failed: Command timed out: vsnap_targetcli /loopback/naa.50014050f9c3e206/luns delete lun3 IBM Spectrum Protect Plus Versions Affected: 10.1.5 P1 Initial Impact: Medium Additional Keywords: TS003432830 REPLICATION OFFLOAD TIMEOUT |MDVREGR 10.1.5-2181|
Local fix
Reboot the vSnap server.
Problem summary
**************************************************************** * USERS AFFECTED: * * IBM Spectrum Protect Plus level 10.1.5 patch1 * **************************************************************** * PROBLEM DESCRIPTION: * * See Error Description * **************************************************************** * RECOMMENDATION: * * Apply fixing level when available. This problem is currently * * projected to be fixed in IBM Spectrum Protect Plus level * * 10.1.5.2199 and 10.1.6. Note that this is subject to change * * at the discretion of IBM. * ****************************************************************
Problem conclusion
As pat of the 10.1.5.2181 release, the Linux kernel bundled with vSnap was updated to version 4.19.101. The upstream kernel community had introduced a bug in the kernel that could cause the Linux IO (LIO) subsystem to hang when deleting loopback devices. vSnap uses loopback devices for copy operations to object storage. During the cleanup of the copy operation when the loopback device is cleaned up, the LIO hang could be triggered, which then caused all subsequent copy operations to fail. The upstream kernel community fixed the bug by reverting the patches that introduced this problem. The patches were reverted in a newer kernel version 4.19.110. This newer version has now been incorporated into vSnap, thus resolving the problem.
Temporary fix
Comments
APAR Information
APAR number
IT32252
Reported component name
SP PLUS
Reported component ID
5737SPLUS
Reported release
A15
Status
CLOSED PER
PE
NoPE
HIPER
NoHIPER
Special Attention
NoSpecatt / Xsystem
Submitted date
2020-03-20
Closed date
2020-03-25
Last modified date
2020-03-25
APAR is sysrouted FROM one or more of the following:
APAR is sysrouted TO one or more of the following:
Fix information
Fixed component name
SP PLUS
Fixed component ID
5737SPLUS
Applicable component levels
[{"Business Unit":{"code":"BU058","label":"IBM Infrastructure w\/TPS"},"Product":{"code":"SSNQFQ","label":"IBM Spectrum Protect Plus"},"Platform":[{"code":"PF025","label":"Platform Independent"}],"Version":"A15","Line of Business":{"code":"LOB26","label":"Storage"}}]
Document Information
Modified date:
31 January 2024