IBM Support

IT29003: BACKUP OF HYPER-V GUESTS JOB HANG

Subscribe

You can track all active APARs for this component.

 

APAR status

  • Closed as program error.

Error description

  • VM backup jobs might hang if encountering a storage problem on
    the Hyper-V environment.
    
    
    In the job log, the job can be seen starting, completing the
    in-guest file inventory, and then take the snapshot but after
    that nothing moves :
    
    ...
    INFO,<timestamp>,2,Discovery on Host <vmname> completed with
    status success
    INFO,<timestamp>,2,VM: <vmname> has transferred 0.00 B ( 0%).
    Throughput since last update - 0.00 B/s
    INFO,<timestamp>,2,Taking snapshot for vm (<vmname>)
    INFO,<timestamp>,2,Provision size of vm (<vmname>) is xxxxxxx
    bytes
    
    in the virgo log, nothing more is seen :
    
    ...
    [<timestamp>] INFO pool-67-thread-1
    c.c.e.s.protection.hypervisor.cbt.CbtBackupUpdateHandler <JobId>
    Transfer update VM: <vmname> transferred 0 size 0 in 339
    [<timestamp>] INFO pool-67-thread-1
    c.c.e.s.protection.hypervisor.cbt.CbtBackupUpdateHandler <JobId>
    VM: <vmname> has transferred 0.00 B ( 0%). Throughput since last
    update - 0.00 B/s
    [<timestamp>] INFO pool-67-thread-1
    c.c.e.s.common.hypervisor.hyperv.HypervVirtualMachine <JobId>
    Taking snapshot for vm (<vmname>)
    [<timestamp>] INFO pool-67-thread-1
    c.c.e.s.common.hypervisor.hyperv.HypervVirtualMachine <JobId>
    Created snapshot (SPPBackup-...) of VM (<vmname>)
    [<timestamp>] INFO pool-67-thread-1
    com.syncsort.dp.xsb.sessionmanager.impl.SessionManagerImpl
    <JobId> SessionManager: Creating new session with ID:
    5dd4aca0f131483c83b6f61fe1b0ad66
    [<timestamp>] INFO pool-67-thread-1
    c.c.e.s.common.hypervisor.hyperv.HypervVirtualMachine <JobId>
    snapKey...Microsoft:39D6BE89-5733-451F-A980-40DD3C1F5174
    [<timestamp>] INFO pool-67-thread-1
    c.c.e.s.common.hypervisor.hyperv.HypervVirtualMachine <JobId>
    Provision size of vm (<vmname>) is xxxxxxx bytes
    
    in vsnap log also no errors are seen.
    
    In the Hyper-V guest, we can see the following :
    
    Warning,<timestamp>,Microsoft-Windows-FailoverClustering,5133,Cl
    uster Shared Volume,
       "Cluster Disk '<vSnapVolName>' has been removed and placed
    back in the 'Available Storage' cluster group.
       During this process an attempt to restore the original drive
    letter(s) has taken longer than expected, possibly due to those
    drive letters being already in use."
    Information,<timestamp>,Microsoft-Windows-FailoverClustering,163
    5,Resource Control Manager,
       Cluster resource '<vSnapVolName>' of type 'Physical Disk' in
    clustered role 'Available Storage' failed.
    Error,<timestamp>,Microsoft-Windows-FailoverClustering,1795,Phys
    ical Disk Resource,
       "Cluster physical disk resource terminate encountered an
    error.
       Physical Disk resource name: <vSnapVolName>
                     Device Number: 4294967295
                       Device Guid:
    {b526d22d-48a5-4694-f530-1e7fddd19bd0}
                        Error Code: 1168
                 Additional reason: ReleaseDiskPRFailure"
    Error,<timestamp>,Microsoft-Windows-FailoverClustering,1069,Reso
    urce Control Manager,
       "Cluster resource '<vSnapVolName>' of type 'Physical Disk' in
    clustered role 'a37eb0d6-e430-42a7-8dd7-9f4ab1cfdda3' failed.
       The error code was '0x2' ('The system cannot find the file
    specified.')
       Based on the failure policies for the resource and role, the
    cluster service may try to bring the resource online on this
    node or move the group to another node of the cluster and then
    restart it.
       Check the resource and group state using Failover Cluster
    Manager or the Get-ClusterResource Windows PowerShell cmdlet."
    Error,<timestamp>,Microsoft-Windows-FailoverClustering,1794,Phys
    ical Disk Resource,"Cluster physical disk resource offline
    failed.
       Physical Disk resource name: <vSnapVolName>
                     Device Number: 9
                       Device Guid:
    {b526d22d-48a5-4694-f530-1e7fddd19bd0}
                        Error Code: 2
                 Additional reason: OpenDevicePathFailure"
    Warning,<timestamp>,Microsoft-Windows-Ntfs,140,None,
       "The system failed to flush data to the transaction log.
       Corruption may occur in VolumeId: <vSnapVolName>,
       DeviceName: \Device\HarddiskVolumexxx.
       (A device which does not exist was specified.)"
    Warning,<timestamp>,Disk,157,None,Disk <x> has been surprise
    removed.
    Error,<timestamp>,Disk,15,None,"The device,
    \Device\Harddisk9\yyy, is not ready for access yet."
    ...
    Information,<timestamp>,iScsiPrt,34,None,"A connection to the
    target was lost, but Initiator successfully reconnected to the
    target. Dump data contains the target name."
       Error,<timestamp>,iScsiPrt,20,None,Connection to the target
    was lost. The initiator will attempt to retry the connection.
    
    
    IBM Spectrum Protect Versions Affected:
    IBM Spectrum Protect Plus 10.1.x
    
    
    Initial Impact: Medium
    
    Additional Keywords: SPP, SPPLUS, TS002171844
    

Local fix

  • Restart the Spectrum Protect Plus Appliance
    

Problem summary

  • ****************************************************************
    * USERS AFFECTED:                                              *
    * IBM Spectrum Protect Plus level 10.1.3 and 10.1.4            *
    ****************************************************************
    * PROBLEM DESCRIPTION:                                         *
    * See ERROR DESCRIPTION                                        *
    ****************************************************************
    * RECOMMENDATION:                                              *
    * Apply fixing level when available. This problem is currently *
    * projected to be fixed IBM Spectrum Protect Plus levels       *
    * 10.1.4.179 and 10.1.5. Note that this is subject to change   *
    * at the discretion of IBM.                                    *
    ****************************************************************
    

Problem conclusion

  • The problem with Hyper-V backups hanging was being caused by an
    attempt to remove disk from a cluster shared volume prior to
    bringing it offline. A code change was made to ensure the disk
    comes offline before attempting to remove it.
    

Temporary fix

Comments

APAR Information

  • APAR number

    IT29003

  • Reported component name

    SP PLUS

  • Reported component ID

    5737SPLUS

  • Reported release

    A13

  • Status

    CLOSED PER

  • PE

    NoPE

  • HIPER

    NoHIPER

  • Special Attention

    NoSpecatt / Xsystem

  • Submitted date

    2019-05-02

  • Closed date

    2019-08-19

  • Last modified date

    2019-08-19

  • APAR is sysrouted FROM one or more of the following:

  • APAR is sysrouted TO one or more of the following:

Fix information

  • Fixed component name

    SP PLUS

  • Fixed component ID

    5737SPLUS

Applicable component levels

[{"Business Unit":{"code":"BU058","label":"IBM Infrastructure w\/TPS"},"Product":{"code":"SSNQFQ","label":"IBM Spectrum Protect Plus"},"Platform":[{"code":"PF025","label":"Platform Independent"}],"Version":"A13","Line of Business":{"code":"LOB26","label":"Storage"}}]

Document Information

Modified date:
30 January 2024