APAR status
Closed as program error.
Error description
VM backup jobs might hang if encountering a storage problem on the Hyper-V environment. In the job log, the job can be seen starting, completing the in-guest file inventory, and then take the snapshot but after that nothing moves : ... INFO,<timestamp>,2,Discovery on Host <vmname> completed with status success INFO,<timestamp>,2,VM: <vmname> has transferred 0.00 B ( 0%). Throughput since last update - 0.00 B/s INFO,<timestamp>,2,Taking snapshot for vm (<vmname>) INFO,<timestamp>,2,Provision size of vm (<vmname>) is xxxxxxx bytes in the virgo log, nothing more is seen : ... [<timestamp>] INFO pool-67-thread-1 c.c.e.s.protection.hypervisor.cbt.CbtBackupUpdateHandler <JobId> Transfer update VM: <vmname> transferred 0 size 0 in 339 [<timestamp>] INFO pool-67-thread-1 c.c.e.s.protection.hypervisor.cbt.CbtBackupUpdateHandler <JobId> VM: <vmname> has transferred 0.00 B ( 0%). Throughput since last update - 0.00 B/s [<timestamp>] INFO pool-67-thread-1 c.c.e.s.common.hypervisor.hyperv.HypervVirtualMachine <JobId> Taking snapshot for vm (<vmname>) [<timestamp>] INFO pool-67-thread-1 c.c.e.s.common.hypervisor.hyperv.HypervVirtualMachine <JobId> Created snapshot (SPPBackup-...) of VM (<vmname>) [<timestamp>] INFO pool-67-thread-1 com.syncsort.dp.xsb.sessionmanager.impl.SessionManagerImpl <JobId> SessionManager: Creating new session with ID: 5dd4aca0f131483c83b6f61fe1b0ad66 [<timestamp>] INFO pool-67-thread-1 c.c.e.s.common.hypervisor.hyperv.HypervVirtualMachine <JobId> snapKey...Microsoft:39D6BE89-5733-451F-A980-40DD3C1F5174 [<timestamp>] INFO pool-67-thread-1 c.c.e.s.common.hypervisor.hyperv.HypervVirtualMachine <JobId> Provision size of vm (<vmname>) is xxxxxxx bytes in vsnap log also no errors are seen. In the Hyper-V guest, we can see the following : Warning,<timestamp>,Microsoft-Windows-FailoverClustering,5133,Cl uster Shared Volume, "Cluster Disk '<vSnapVolName>' has been removed and placed back in the 'Available Storage' cluster group. During this process an attempt to restore the original drive letter(s) has taken longer than expected, possibly due to those drive letters being already in use." Information,<timestamp>,Microsoft-Windows-FailoverClustering,163 5,Resource Control Manager, Cluster resource '<vSnapVolName>' of type 'Physical Disk' in clustered role 'Available Storage' failed. Error,<timestamp>,Microsoft-Windows-FailoverClustering,1795,Phys ical Disk Resource, "Cluster physical disk resource terminate encountered an error. Physical Disk resource name: <vSnapVolName> Device Number: 4294967295 Device Guid: {b526d22d-48a5-4694-f530-1e7fddd19bd0} Error Code: 1168 Additional reason: ReleaseDiskPRFailure" Error,<timestamp>,Microsoft-Windows-FailoverClustering,1069,Reso urce Control Manager, "Cluster resource '<vSnapVolName>' of type 'Physical Disk' in clustered role 'a37eb0d6-e430-42a7-8dd7-9f4ab1cfdda3' failed. The error code was '0x2' ('The system cannot find the file specified.') Based on the failure policies for the resource and role, the cluster service may try to bring the resource online on this node or move the group to another node of the cluster and then restart it. Check the resource and group state using Failover Cluster Manager or the Get-ClusterResource Windows PowerShell cmdlet." Error,<timestamp>,Microsoft-Windows-FailoverClustering,1794,Phys ical Disk Resource,"Cluster physical disk resource offline failed. Physical Disk resource name: <vSnapVolName> Device Number: 9 Device Guid: {b526d22d-48a5-4694-f530-1e7fddd19bd0} Error Code: 2 Additional reason: OpenDevicePathFailure" Warning,<timestamp>,Microsoft-Windows-Ntfs,140,None, "The system failed to flush data to the transaction log. Corruption may occur in VolumeId: <vSnapVolName>, DeviceName: \Device\HarddiskVolumexxx. (A device which does not exist was specified.)" Warning,<timestamp>,Disk,157,None,Disk <x> has been surprise removed. Error,<timestamp>,Disk,15,None,"The device, \Device\Harddisk9\yyy, is not ready for access yet." ... Information,<timestamp>,iScsiPrt,34,None,"A connection to the target was lost, but Initiator successfully reconnected to the target. Dump data contains the target name." Error,<timestamp>,iScsiPrt,20,None,Connection to the target was lost. The initiator will attempt to retry the connection. IBM Spectrum Protect Versions Affected: IBM Spectrum Protect Plus 10.1.x Initial Impact: Medium Additional Keywords: SPP, SPPLUS, TS002171844
Local fix
Restart the Spectrum Protect Plus Appliance
Problem summary
**************************************************************** * USERS AFFECTED: * * IBM Spectrum Protect Plus level 10.1.3 and 10.1.4 * **************************************************************** * PROBLEM DESCRIPTION: * * See ERROR DESCRIPTION * **************************************************************** * RECOMMENDATION: * * Apply fixing level when available. This problem is currently * * projected to be fixed IBM Spectrum Protect Plus levels * * 10.1.4.179 and 10.1.5. Note that this is subject to change * * at the discretion of IBM. * ****************************************************************
Problem conclusion
The problem with Hyper-V backups hanging was being caused by an attempt to remove disk from a cluster shared volume prior to bringing it offline. A code change was made to ensure the disk comes offline before attempting to remove it.
Temporary fix
Comments
APAR Information
APAR number
IT29003
Reported component name
SP PLUS
Reported component ID
5737SPLUS
Reported release
A13
Status
CLOSED PER
PE
NoPE
HIPER
NoHIPER
Special Attention
NoSpecatt / Xsystem
Submitted date
2019-05-02
Closed date
2019-08-19
Last modified date
2019-08-19
APAR is sysrouted FROM one or more of the following:
APAR is sysrouted TO one or more of the following:
Fix information
Fixed component name
SP PLUS
Fixed component ID
5737SPLUS
Applicable component levels
[{"Business Unit":{"code":"BU058","label":"IBM Infrastructure w\/TPS"},"Product":{"code":"SSNQFQ","label":"IBM Spectrum Protect Plus"},"Platform":[{"code":"PF025","label":"Platform Independent"}],"Version":"A13","Line of Business":{"code":"LOB26","label":"Storage"}}]
Document Information
Modified date:
30 January 2024