IBM Support

IC75419: FLASHCOPY MANAGER BACKINT CORES AFTER REPEATED 'LSLV' FAILURES LEADING TO TIMEOUT_PARTITION

Subscribe

You can track all active APARs for this component.

 

APAR status

  • Closed as fixed if next.

Error description

  • When a volume group is found locked by FlashCopy Manager (for
    reasons outside the product) during the PARTITION phase of a
    FlashCopy backup, the following sequence in the
    details backup log may lead to a core dump of the
    FlashCopy Manager backint executable:
    
    04:25:46 (89a) FMM1582I The target set 1 will be used for the
                   current backup.
    04:25:46 (89a) FMM6901I Response to Init request.
    04:25:46 (89a) FMM6902I Response to Partition request.
    04:26:11 (798) 0516-1201 lslv: Warning: Volume group vg_name
                   is locked. This command
    04:26:11 (798)  will continue retries until lock is free.
                    If lock is inadvertent
    04:26:11 (798)  and needs to be removed, execute
                    'chvg -u vg_name'.
    04:37:23 (89a) FMM0570I Finding the serial numbers ...
    05:25:46 (798) FMM9011E There was no response received
                   within 3600 seconds; timeout is expired.
                   You can increase the timeout by specifying the
                   profile parameter TIMEOUT_<PHASE>
                   for the current phase
    05:25:46 (798) FMM8300I tsmACSPartition() returned with
                   code 18.
    05:25:46 (798) FMM9203E Additional support information:
                   An exception was thrown at position: /home/b
    pointer.hpp(118).
      esd-snapservicecontroler.cpp(1828):
      esd-snapservicecontroler.cpp(548):
      esd-snapprocessserviceassociation.cpp(300):
      esd-fcclosemessagehandler.cpp(61):
      esd-fcclosemessagehandler.cpp(60):
      esd-serializableproxyhandler.cpp(97):
      esd-serializableproxyhandler.cpp(81):
      esd-portalcommunicator.cpp(697):
      esd-portalcommunicator.cpp(338):
      esd-portalcommunicator.cpp(276):
      esd-thread.cpp(67):
    05:25:46 (798) FMM1015E Operation backup completed with error.
    05:25:46 (798) FMM0020I End of program at:
                   Mon Mar  7 05:25:46 GMT 2011 .
    05:25:46 (798) FMM0021I Elapsed time: 01 h 00 min 01 sec .
    
    Sample callstack
     - - - - - - - -
    IOT/Abort trap in pthread_kill at 0x9000000004b671c ($t1)
    0x9000000004b671c (pthread_kill+0x88) e8410028
        ld   r2,0x28(r1)
    (dbx) where
    pthread_kill(??, ??) at 0x9000000004b671c
    _p_raise(??) at 0x9000000004b6130
    raise.raise(??) at 0x90000000005d898
    abort() at 0x9000000000893f4
    std::myabort()() at 0x90000000045e20c
    std::terminate()() at 0x90000000045d76c
    invokedtr.__Invoke__Destructor()
        at 0x90000000045d50c
    __DoThrowV6() at 0x9000000004608cc
    ESD_NonRecursiveMutex::acquire()(0x1108922e0) at 0x1009de228
    esd-asynctask.ESD_LockGuard<ESD_NonRecursiveMutex>:
        :ESD_LockGuard(ESD_NonRecursiveMutex&)
        (0xfffffffffff96c0, 0x1108922e0) at 0x10021458c
    esd-fakecompressiondatablockallocator.ESD_CopyOnWriteMemory:
        :ReferenceManagement::decrementRefCount()(0x1108922d0)
        at 0x10029dd90
    esd-cstring.ESD_CopyOnWriteMemory::clone(void*,unsigned long).
        ESD_CopyOnWriteMemory::release(void*)(0x110892338)
        at 0x100203f78
    esd-cstring.ESD_CopyOnWriteMemory::~ESD_CopyOnWriteMemory()
        (0xfffffffffff9d88, 0x0) at 0x100205b68
    ESD_WString::~ESD_WString()(0xfffffffffff9d88, 0x200000002)
        at 0x10028ac90
    ESD_MessageBlock::~ESD_MessageBlock()
        (0xfffffffffff9d60, 0x0, 0x0) at 0x100008fec
    ESD_MessageVersion::~ESD_MessageVersion()
        (0xfffffffffff9d60, 0x200000002, 0x0) at 0x100a3bbec
    ESD_MessageHeader::~ESD_MessageHeader()
        (0xfffffffffff9d00, 0x200000002, 0x0) at 0x100a3d9b4
    ESD_Message::~ESD_Message()
        (0xfffffffffff9c38, 0x0, 0x0) at 0x100a3aa54
    (dbx)
    
    TSM Versions Affected: All AIX versions of FlschCopy manager
                           for SAP on Oracle
           Initial Impact: Medium
      Additional Keywords: FCM TFCM Tivoli Storage Manager TSM
                           Advanced Copy Services ACS
    

Local fix

  • The fix for this defect cannot prevent the backup failure
    as such, because the timeout was caused by reasons outside
    the product.
    Only the failure handling will be corrected, such that
    instead of core dumping, an error will be logged.
    
    Therefore focus on preventing the timeout:
    o Make sure the volume group in question is not locked.
      If necessary run 'chvg -u vg_name'.
    o Follow the suggestions of the error message and increase
      the timeout parameter in the FlaschCopy Manager profile.
      As the error happened in phase PARTITION, the name of the
      parameter would be TIMEOUT_PARTITION
    

Problem summary

  • ****************************************************************
    * USERS AFFECTED: All users of IBM Tivoli FlashCopy Manager    *
    *                 versions 2.1 and 2.2 who encounter a         *
    *                 TIMEOUT_PARTITION exception.                 *
    ****************************************************************
    * PROBLEM DESCRIPTION: See ERROR DESCRIPTION                   *
    ****************************************************************
    * RECOMMENDATION:                                              *
    ****************************************************************
    Backint module of IBM Tivoli FlashCopy Manager needs to be
    corrected so that when a TIMEOUT_PARTITION exception occurs, an
    error will be logged but the product will not core.
    
    If there is a next release of Tivoli FlashCopy Manager, this
    problem will be fixed in that next release.
    

Problem conclusion

Temporary fix

  • The fix for this defect cannot prevent the backup failure
    because the timeout was caused by reasons external to the
    product.
    
    Only the failure handling will be corrected so that instead of
    core dumping, an error will be logged.
    
    Use the following suggestions to prevent the timeout:
    
     o Ensure the volume group in question is not locked.
       If necessary run 'chvg -u vg_name'.
    
     o Follow the suggestions of the error message and increase
       the timeout parameter in the FlaschCopy Manager profile.
    
       Since the error happened in phase PARTITION, the name of
       the parameter would be TIMEOUT_PARTITION.
    

Comments

APAR Information

  • APAR number

    IC75419

  • Reported component name

    FLSHCPY ORACLE

  • Reported component ID

    5608AC6OS

  • Reported release

    21A

  • Status

    CLOSED FIN

  • PE

    NoPE

  • HIPER

    NoHIPER

  • Special Attention

    NoSpecatt

  • Submitted date

    2011-03-30

  • Closed date

    2011-05-27

  • Last modified date

    2011-05-27

  • APAR is sysrouted FROM one or more of the following:

  • APAR is sysrouted TO one or more of the following:

Fix information

Applicable component levels

  • R21A PSN

       UP

  • R22A PSN

       UP

  • R22L PSN

       UP

  • R22S PSN

       UP

[{"Business Unit":{"code":"BU058","label":"IBM Infrastructure w\/TPS"},"Product":{"code":"SS36V9","label":"Tivoli Storage FlashCopy Manager"},"Component":"","ARM Category":[],"Platform":[{"code":"PF025","label":"Platform Independent"}],"Version":"21A","Edition":"","Line of Business":{"code":"LOB26","label":"Storage"}}]

Document Information

Modified date:
27 May 2011