Troubleshooting
Problem
A FlashCopy backup performed with IBM Tivoli Storage FlashCopy Manager can fail, if the communication with the storage subsystem takes more than 10 seconds.
Symptom
A FlashCopy backup from either Microsoft Management Console (MMC) GUI or fcmcli fails with the following messages in fcm.log:
FMV1327W The snapshot operation for 'volname' failed with error code: -1
FMV1235E (RC-1) An unknown system error has occurred from which TSM cannot recover.
Cause
Microsoft VSS has a non configurable 10 second timeout to flush and hold VSS writers.
https://msdn.microsoft.com/en-us/library/windows/desktop/aa384615%28v=vs.85%29.aspx
"VSS kernel support will cancel the Flush and Hold that is holding the I/O if the subsequent release is not received within 10 seconds, and VSS will fail the shadow copy creation process."
Environment
IBM Tivoli Storage FlashCopy Manager on Windows
IBM SAN Volume Controller (SVC)
IBM Storwize family
Diagnosing The Problem
The application event log will show the timeout is reached during VSS processing:
Error,<Date and timestamp>,VSS,12298,None,"Volume Shadow Copy Service error: The I/O writes cannot be held during the shadow copy creation period on volume <MyVolume>.
The IBMVSS.log shows for example:
Thu Jun 02 18:34:54.625 - IBMVSS: CommitSnapshots
Thu Jun 02 18:35:05.108 - IBMVSS: Exiting CommitSnapshots
Thu Jun 02 18:35:05.170 - IBMVSS: AbortSnapshots
Resolving The Problem
Given the above errors, the CIMOM logs can be reviewed to check where most of the time is spent during snapshot operation.
In this case the problem is the underlying network between Windows Server and IBM storage subsytem:
CIMOM log showed it took 7 seconds until the CommitSnapshot command arrived on CIMOM:
2016-06-02T18:34:51:164837 0xf749eb70 IBMTSSVC_StorageHardwareIDEnumInstances_0xf749eb70_301443228 CIM | glue/ibmtssvc/StorageHardwareIDGlue.cpp:IBMTSSVC_StorageHardwareIDEnumInstances:817 ENTER
2016-06-02T18:35:01:852079 0xf6dffb70 IBMTSSVC_StorageConfigurationServiceInvokeMethod_ModifySynchronization_0xf6dffb70_301453916 CIM | glue/ibmtssvc/StorageConfigurationServiceGlue.cpp:IBMTSSVC_StorageConfigurationServiceInvokeMethod:4595 ENTER
2016-06-02T18:35:01:852114 0xf6dffb70 doModifySynchronization_0xf6dffb70_301453916 CIM | glue/ibmtssvc/StorageConfigurationServiceGlue.cpp:doModifySynchronization:2429 ENTER
2016-06-02T18:35:04:809683 0xf6dffb70 doModifySynchronization_0xf6dffb70_301453916 CIM | glue/cpa/cpalog.cpp:~audit_enter_exit:822 Return with code:0
2016-06-02T18:35:04:809748 0xf6dffb70 IBMTSSVC_StorageConfigurationServiceInvokeMethod_ModifySynchronization_0xf6dffb70_301453916 CIM | glue/cpa/cpalog.cpp:~audit_enter_exit:825 Return with: CMPI_RC_OK
Based on this analysis the network needs to be debugged to reduce the communication roundtrip between Windows Server and IBM storage subsystem.
Was this topic helpful?
Document Information
Modified date:
17 June 2018
UID
swg21985096