IBM Support

IT23270: TAPE OPTIMIZED RECALL (TOR) CAN HANG WITH DEADLOCK AT END OF RECALL PROCESSING

Subscribe

You can track all active APARs for this component.

 

APAR status

  • Closed as program error.

Error description

  • Space Management client Tape Optimized Recall (TOR) can hang
    with deadlock at end of the processing
    in a deadlock situation.
    
    For example:
    
    dsmrecall -d -filelist=/tmp/flist /fs_name
    
    Start Tape Optimized Recall!
    ----------------------------
    Starting IBM Spectrum Protect Server query to get ordering
    information for file list: '/tmp/flist' ...
    
    Session established with server ABC: AIX
    
    
    ANS1898I ***** Processed    0 / 1000 files *****
    ANS1898I ***** Processed   87 / 1000 files *****
    ANS1898I ***** Processed  199 / 1000 files *****
    ANS1898I ***** Processed  307 / 1000 files *****
    ANS1898I ***** Processed  418 / 1000 files *****
    ANS1898I ***** Processed  523 / 1000 files *****
    ANS1898I ***** Processed  632 / 1000 files *****
    ANS1898I ***** Processed  741 / 1000 files *****
    ANS1898I ***** Processed  837 / 1000 files *****
    ANS1898I ***** Processed  864 / 1000 files *****
    ANS1898I ***** Processed  896 / 1000 files *****
    ANS1898I ***** Processed  968 / 1000 files *****
    ANS1898I ***** Processed 1000 / 1000 files *****
    
    IBM Spectrum Protect Server query successfully finished.
    
    Starting file list ordering ...
    Number of processed file lists: 1 / 1000 ( 3 / 20 )
    Number of processed file lists: 2 / 1000 ( 3 / 20 )
    Number of processed file lists: 3 / 1000 ( 3 / 20 )
    Number of processed file lists: 4 / 1000 ( 4 / 20 )
    
    etc
    
    Number of processed file lists: 1994 / 2000 ( 7 / 20 )
    Number of processed file lists: 1995 / 2000 ( 5 / 20 )
    Number of processed file lists: 1996 / 2000 ( 4 / 20 )
    Number of processed file lists: 1997 / 2000 ( 3 / 20 )
    Number of processed file lists: 1998 / 2000 ( 2 / 20 )
    Number of processed file lists: 1999 / 2000 ( 1 / 20 )
    
    It hangs on the last file and that files remains in 'migrated'
    state.
    
    
    Procstack on AIX shows threads in a deadlock:
    
    
    10158556: dsmrecall -d -filelist=/tmp/flist /fs_name
    ---------- tid# 76677513 (pthread ID:      1) ----------
    0x09000000005b3290  _p_nsleep(??, ??) + 0x10
    0x090000000003f444  nsleep(??, ??) + 0xe4
    0x090000000003f24c  usleep@AF4_1(??) + 0x4c
    0x00000001000541c4
    TapeOptimizedRecall::startFileListProcessing(TapeOptimizedRecall
    ::collectionEntryOrder_t)() + 0x25c4
    0x0000000100001488  main() + 0xc88
    0x00000001000002b0  __start() + 0x70
    ---------- tid# 90898765 (pthread ID:   1068) ----------
    0x09000000005ad874  _event_sleep(??, ??, ??, ??, ??, ??) + 0x514
    0x09000000005ae510  _event_wait(??, ??) + 0x350
    0x09000000005bd87c  _cond_wait_local(??, ??, ??) + 0x4fc
    0x09000000005bde14  _cond_wait(??, ??, ??) + 0x34
    0x09000000005be848  pthread_cond_wait(??, ??) + 0x1a8
    0x0000000100027b10  cSyncObjectCondition::Wait(int)() + 0x1f0
    0x00000001000254e8
    cQueue::ReadElement(cQueueBaseObject*&,int)() + 0x8e8
    0x00000001002ccf18  MediaOrderingThread::getNextFileListObject()
    const() + 0xb8
    0x00000001002cc5e4  MediaOrderingThread::ThreadFunc()() + 0xa4
    0x00000001000281ec  cThreadBase::StaticThreadFunc(void*)() +
    0x18c
    0x0900000000598fe8  _pthread_body(??) + 0xe8
    ---------- tid# 92668313 (pthread ID:    258) ----------
    0x09000000005ad874  _event_sleep(??, ??, ??, ??, ??, ??) + 0x514
    0x09000000005b2ae8  _p_sigtimedwait(??, ??, ??) + 0x468
    0x09000000001251f0  sigtimedwait(??, ??, ??) + 0x30
    0x09000000001256bc  sigwait(??, ??) + 0x1c
    0x0000000100003ab8  torRecallWaitSignal(void*)() + 0xb8
    0x0900000000598fe8  _pthread_body(??) + 0xe8
    
    In this case, the main thread (1) is waiting forever till the
    other thread (1068) is done. But
    the latter thread (1068) cannot finish since  it is waiting for
    an event from another
    thread that has completed already. So a deadlock occurs.
    
    Initial Impact: Medium
    
    
    Tivoli Storage Manager and IBM Spectrum Protect version
    affected:
    Space Management client for Unix 7.1.x and 8.1.x on supported
    platforms
    
    
    
    
    Additional Keywords:  TSM  Spectrum Protect hang threads TOR
    

Local fix

  • Stop the hanging dsmrecall process via 'kill -15' command.
    There is no need to restart the dsmrecalld daemon.
    
    To prevent the hang, set:
    HSMMAXREcalltapedrives 1
    option in dsm.sys
    but this will limit recall to one tape drive.
    

Problem summary

  • ****************************************************************
    * USERS AFFECTED:                                              *
    * IBM Spectrum Protect for Space Management (HSM) client       *
    * versions 6.4, 7.1 and 8.1 on AIX and Linux platforms.        *
    ****************************************************************
    * PROBLEM DESCRIPTION:                                         *
    * see ERROR DESCRIPTION                                        *
    ****************************************************************
    * RECOMMENDATION:                                              *
    * Apply fixing level when available. This problem is projected *
    * to be fixed in level 8.1.6. Note that this is subject to     *
    * change at the discretion of IBM.                             *
    ****************************************************************
    *
    

Problem conclusion

  • IBM Spectrum Protect for Space Management Tape Optimized Recall
    should not hang any more at the end of operation.
    

Temporary fix

Comments

APAR Information

  • APAR number

    IT23270

  • Reported component name

    TSM SPACE MGMT

  • Reported component ID

    5698HSMCL

  • Reported release

    81A

  • Status

    CLOSED PER

  • PE

    NoPE

  • HIPER

    NoHIPER

  • Special Attention

    NoSpecatt / Xsystem

  • Submitted date

    2017-12-04

  • Closed date

    2017-12-19

  • Last modified date

    2017-12-19

  • APAR is sysrouted FROM one or more of the following:

  • APAR is sysrouted TO one or more of the following:

Modules/Macros

  • dsmrecal
    

Fix information

  • Fixed component name

    TSM SPACE MGMT

  • Fixed component ID

    5698HSMCL

Applicable component levels

  • R81L PSY

       UP

  • R81A PSY

       UP

[{"Line of Business":{"code":"LOB26","label":"Storage"},"Business Unit":{"code":"BU058","label":"IBM Infrastructure w\/TPS"},"Product":{"code":"SSSR2R","label":"Tivoli Storage Manager for Space Management"},"Platform":[{"code":"PF025","label":"Platform Independent"}],"Version":"81A"}]

Document Information

Modified date:
28 September 2021