IBM Support

FlashCopy backup may fail with FMV1327W due to VSS timeout limitation

Troubleshooting


Problem

A FlashCopy backup performed with IBM Tivoli Storage FlashCopy Manager can fail, if the communication with the storage subsystem takes more than 10 seconds.

Symptom

A FlashCopy backup from either Microsoft Management Console (MMC) GUI or fcmcli fails with the following messages in fcm.log:

FMV1327W The snapshot operation for 'volname' failed with error code: -1
FMV1235E (RC-1) An unknown system error has occurred from which TSM cannot recover.

Cause

Microsoft VSS has a non configurable 10 second timeout to flush and hold VSS writers.

https://msdn.microsoft.com/en-us/library/windows/desktop/aa384615%28v=vs.85%29.aspx

"VSS kernel support will cancel the Flush and Hold that is holding the I/O if the subsequent release is not received within 10 seconds, and VSS will fail the shadow copy creation process."

Environment

IBM Tivoli Storage FlashCopy Manager on Windows

IBM SAN Volume Controller (SVC)

IBM Storwize family

Diagnosing The Problem

The application event log will show the timeout is reached during VSS processing:
Error,<Date and timestamp>,VSS,12298,None,"Volume Shadow Copy Service error: The I/O writes cannot be held during the shadow copy creation period on volume <MyVolume>.

The IBMVSS.log shows for example:
Thu Jun 02 18:34:54.625 - IBMVSS: CommitSnapshots
Thu Jun 02 18:35:05.108 - IBMVSS: Exiting CommitSnapshots
Thu Jun 02 18:35:05.170 - IBMVSS: AbortSnapshots

Resolving The Problem

Given the above errors, the CIMOM logs can be reviewed to check where most of the time is spent during snapshot operation.

In this case the problem is the underlying network between Windows Server and IBM storage subsytem:

CIMOM log showed it took 7 seconds until the CommitSnapshot command arrived on CIMOM:


2016-06-02T18:34:51:164837 0xf749eb70  IBMTSSVC_StorageHardwareIDEnumInstances_0xf749eb70_301443228 CIM | glue/ibmtssvc/StorageHardwareIDGlue.cpp:IBMTSSVC_StorageHardwareIDEnumInstances:817 ENTER
2016-06-02T18:35:01:852079 0xf6dffb70  IBMTSSVC_StorageConfigurationServiceInvokeMethod_ModifySynchronization_0xf6dffb70_301453916 CIM | glue/ibmtssvc/StorageConfigurationServiceGlue.cpp:IBMTSSVC_StorageConfigurationServiceInvokeMethod:4595 ENTER
2016-06-02T18:35:01:852114 0xf6dffb70  doModifySynchronization_0xf6dffb70_301453916 CIM | glue/ibmtssvc/StorageConfigurationServiceGlue.cpp:doModifySynchronization:2429 ENTER
2016-06-02T18:35:04:809683 0xf6dffb70  doModifySynchronization_0xf6dffb70_301453916 CIM | glue/cpa/cpalog.cpp:~audit_enter_exit:822 Return with code:0
2016-06-02T18:35:04:809748 0xf6dffb70  IBMTSSVC_StorageConfigurationServiceInvokeMethod_ModifySynchronization_0xf6dffb70_301453916 CIM | glue/cpa/cpalog.cpp:~audit_enter_exit:825 Return with: CMPI_RC_OK

Based on this analysis the network needs to be debugged to reduce the communication roundtrip between Windows Server and IBM storage subsystem.

[{"Product":{"code":"SS36V9","label":"Tivoli Storage FlashCopy Manager"},"Business Unit":{"code":"BU058","label":"IBM Infrastructure w\/TPS"},"Component":"FlashCopy Manager VSS Requestor","Platform":[{"code":"PF033","label":"Windows"}],"Version":"All Versions","Edition":"All Editions","Line of Business":{"code":"LOB26","label":"Storage"}}]

Document Information

Modified date:
17 June 2018

UID

swg21985096