APAR status
Closed as program error.
Error description
The IBM Spectrum Protect Plus copy to the IBM Spectrum Protect repository server can stop with the following messages seen in the job log : SUMMARY,<timestamp>,CTGGA2398,Starting job for policy <SLAName> id -> <JobID>. IBM Spectrum Protect Plus version 10.1.7-3043. ... ERROR,<timestamp>,CTGGA0309,Copy failed for snapshot (ID: <SnapshotID>) from source [server: <vSnapAddress> volume: <SourcevSnapVolume> snapshot: <SnapshotName>] to target [server: <vSnapAddress> volume: <TargetVolumeName>]. Error: Exception: Failed to create gateway device: Could not find device path for serial <CloudDeviceSerial> ERROR,<timestamp>,CTGGA0310,Skipping remaining snapshots for volume <vSnapAddress>: <SourcevSnapVolume> due to unrecoverable error for vSnap session <OffloadSessionID> and with these messages in the vSnap replication log : [<timestamp>] INFO pid-<xxxx> vsnap.common.model Session <OffloadSessionID>: message = Preparing cloud gateway device ... [<timestamp>] INFO pid-<xxxx> vsnap.target Creating bdg backing store named offload_ <OffloadPoolName> with cfgstring poc@<xxx>@<yyy>@16, max_data_area_mb=128,hw_block_ size=4096,hw_max_sectors=2048 ... [<timestamp>] INFO pid-<xxxx> vsnap.linux.system Executing command: vsnap_targetcli /loopback/naa.<zzzz>/luns create /backstores/user:bdg/ offload_<OffloadPoolName> ... [<timestamp>] INFO pid-<xxxx> vsnap.cloud.driver Getting device path by serial, attempt 5 [<timestamp>] ERROR pid-<xxxx> vsnap.linux.system Timed out (10 seconds) waiting for command to complete: /lib/udev/scsi_id --page 0x80 --whitelisted --device /dev/sd<x> ... [<timestamp>] WARNING pid-<xxxx> vsnap.cloud.driver Could not determine serial for sd<x>, skipping it [<timestamp>] WARNING pid-<xxxx> vsnap.cloud.driver Could not get device path by serial: Failed to find device with serial <CloudDeviceSerial> This occurs during an incremental copy operation when the vSnap server tries to mount the virtual cloud device and then imports the vSnap cloud pool from it. As soon as the vSnap attaches the cloud device at the start of the copy operation, the Linux Operating System detects that a new disk has been attached and tries to read the partition table. At the same time, the vSnap offload process tries to perform SCSI inquiries to the device to detect its serial number. If the IBM Spectrum Protect object agent is slow to respond to read requests during this time, these SCSI inquiries can time out and cause the copy to fail. In most cases (but not necessarily all), the slow read responses from the IBM Spectrum Protect object agent can be confirmed by looking in the gwdriver<ID>.log file associated with that offload operation located in the vSnap log directory /opt/vsnap/log. The following type of messages will be seen indicating read responses are timing out: WARN: ReadPart(<xxxxxx>/<yyyyyyy>/<zzzzzz>:<aaa>:<bbb>) failed, reason (RequestCanceled: request context canceled) The IBM Spectrum Protect server APAR IT35592 addresses the slow read responses from the object agent. This APAR is to improve the behaviour of the vSnap server when the IBM Spectrum Protect agent is slow in responding to the requests. IBM Spectrum Protect Plus Versions Affected: IBM Spectrum Protect Plus 10.1.3 and higher Initial Impact: High Additional Keywords: SPP, SPPLUS, TS003888479, SP, offload
Local fix
The problem can be mitigated by increasing the vSnap server read timeout from 1 to 4 minutes for cloud objects as follows : vsnap system pref set --name cloudIOReadTimeout --value 240
Problem summary
**************************************************************** * USERS AFFECTED: * * IBM Spectrum Protect Plus levels 10.1.3, 10.1.4, 10.1.5, * * 10.1.6 and 10.1.7. * **************************************************************** * PROBLEM DESCRIPTION: * * See Error Description. * **************************************************************** * RECOMMENDATION: * * Apply the fixing level when available. This problem is * * projected to be fixed in IBM Spectrum Protect Plus level * * 10.1.8. Note that this is subject to change at the * * discretion of IBM. * ****************************************************************
Problem conclusion
A code fix was implemented on vSnap to improve handling of timeouts when the cloud endpoint or repository server is slow to respond to read requests during the initial stage of the copy job. In most cases, this results in copy jobs succeeding instead of failing. Note that in extreme cases, copy jobs can still fail if the cloud endpoint or repository server continues to be very slow to respond.
Temporary fix
Comments
APAR Information
APAR number
IT35884
Reported component name
SP PLUS
Reported component ID
5737SPLUS
Reported release
A16
Status
CLOSED PER
PE
NoPE
HIPER
NoHIPER
Special Attention
NoSpecatt / Xsystem
Submitted date
2021-02-12
Closed date
2021-03-25
Last modified date
2021-03-25
APAR is sysrouted FROM one or more of the following:
APAR is sysrouted TO one or more of the following:
Fix information
Fixed component name
SP PLUS
Fixed component ID
5737SPLUS
Applicable component levels
[{"Business Unit":{"code":"BU058","label":"IBM Infrastructure w\/TPS"},"Product":{"code":"SSNQFQ","label":"IBM Spectrum Protect Plus"},"Platform":[{"code":"PF025","label":"Platform Independent"}],"Version":"A16","Line of Business":{"code":"LOB26","label":"Storage"}}]
Document Information
Modified date:
31 January 2024