Fixes are available
APAR status
Closed as program error.
Error description
Space Management client Tape Optimized Recall (TOR) can hang with deadlock at end of the processing in a deadlock situation. For example: dsmrecall -d -filelist=/tmp/flist /fs_name Start Tape Optimized Recall! ---------------------------- Starting IBM Spectrum Protect Server query to get ordering information for file list: '/tmp/flist' ... Session established with server ABC: AIX ANS1898I ***** Processed 0 / 1000 files ***** ANS1898I ***** Processed 87 / 1000 files ***** ANS1898I ***** Processed 199 / 1000 files ***** ANS1898I ***** Processed 307 / 1000 files ***** ANS1898I ***** Processed 418 / 1000 files ***** ANS1898I ***** Processed 523 / 1000 files ***** ANS1898I ***** Processed 632 / 1000 files ***** ANS1898I ***** Processed 741 / 1000 files ***** ANS1898I ***** Processed 837 / 1000 files ***** ANS1898I ***** Processed 864 / 1000 files ***** ANS1898I ***** Processed 896 / 1000 files ***** ANS1898I ***** Processed 968 / 1000 files ***** ANS1898I ***** Processed 1000 / 1000 files ***** IBM Spectrum Protect Server query successfully finished. Starting file list ordering ... Number of processed file lists: 1 / 1000 ( 3 / 20 ) Number of processed file lists: 2 / 1000 ( 3 / 20 ) Number of processed file lists: 3 / 1000 ( 3 / 20 ) Number of processed file lists: 4 / 1000 ( 4 / 20 ) etc Number of processed file lists: 1994 / 2000 ( 7 / 20 ) Number of processed file lists: 1995 / 2000 ( 5 / 20 ) Number of processed file lists: 1996 / 2000 ( 4 / 20 ) Number of processed file lists: 1997 / 2000 ( 3 / 20 ) Number of processed file lists: 1998 / 2000 ( 2 / 20 ) Number of processed file lists: 1999 / 2000 ( 1 / 20 ) It hangs on the last file and that files remains in 'migrated' state. Procstack on AIX shows threads in a deadlock: 10158556: dsmrecall -d -filelist=/tmp/flist /fs_name ---------- tid# 76677513 (pthread ID: 1) ---------- 0x09000000005b3290 _p_nsleep(??, ??) + 0x10 0x090000000003f444 nsleep(??, ??) + 0xe4 0x090000000003f24c usleep@AF4_1(??) + 0x4c 0x00000001000541c4 TapeOptimizedRecall::startFileListProcessing(TapeOptimizedRecall ::collectionEntryOrder_t)() + 0x25c4 0x0000000100001488 main() + 0xc88 0x00000001000002b0 __start() + 0x70 ---------- tid# 90898765 (pthread ID: 1068) ---------- 0x09000000005ad874 _event_sleep(??, ??, ??, ??, ??, ??) + 0x514 0x09000000005ae510 _event_wait(??, ??) + 0x350 0x09000000005bd87c _cond_wait_local(??, ??, ??) + 0x4fc 0x09000000005bde14 _cond_wait(??, ??, ??) + 0x34 0x09000000005be848 pthread_cond_wait(??, ??) + 0x1a8 0x0000000100027b10 cSyncObjectCondition::Wait(int)() + 0x1f0 0x00000001000254e8 cQueue::ReadElement(cQueueBaseObject*&,int)() + 0x8e8 0x00000001002ccf18 MediaOrderingThread::getNextFileListObject() const() + 0xb8 0x00000001002cc5e4 MediaOrderingThread::ThreadFunc()() + 0xa4 0x00000001000281ec cThreadBase::StaticThreadFunc(void*)() + 0x18c 0x0900000000598fe8 _pthread_body(??) + 0xe8 ---------- tid# 92668313 (pthread ID: 258) ---------- 0x09000000005ad874 _event_sleep(??, ??, ??, ??, ??, ??) + 0x514 0x09000000005b2ae8 _p_sigtimedwait(??, ??, ??) + 0x468 0x09000000001251f0 sigtimedwait(??, ??, ??) + 0x30 0x09000000001256bc sigwait(??, ??) + 0x1c 0x0000000100003ab8 torRecallWaitSignal(void*)() + 0xb8 0x0900000000598fe8 _pthread_body(??) + 0xe8 In this case, the main thread (1) is waiting forever till the other thread (1068) is done. But the latter thread (1068) cannot finish since it is waiting for an event from another thread that has completed already. So a deadlock occurs. Initial Impact: Medium Tivoli Storage Manager and IBM Spectrum Protect version affected: Space Management client for Unix 7.1.x and 8.1.x on supported platforms Additional Keywords: TSM Spectrum Protect hang threads TOR
Local fix
Stop the hanging dsmrecall process via 'kill -15' command. There is no need to restart the dsmrecalld daemon. To prevent the hang, set: HSMMAXREcalltapedrives 1 option in dsm.sys but this will limit recall to one tape drive.
Problem summary
**************************************************************** * USERS AFFECTED: * * IBM Spectrum Protect for Space Management (HSM) client * * versions 6.4, 7.1 and 8.1 on AIX and Linux platforms. * **************************************************************** * PROBLEM DESCRIPTION: * * see ERROR DESCRIPTION * **************************************************************** * RECOMMENDATION: * * Apply fixing level when available. This problem is projected * * to be fixed in level 8.1.6. Note that this is subject to * * change at the discretion of IBM. * **************************************************************** *
Problem conclusion
IBM Spectrum Protect for Space Management Tape Optimized Recall should not hang any more at the end of operation.
Temporary fix
Comments
APAR Information
APAR number
IT23270
Reported component name
TSM SPACE MGMT
Reported component ID
5698HSMCL
Reported release
81A
Status
CLOSED PER
PE
NoPE
HIPER
NoHIPER
Special Attention
NoSpecatt / Xsystem
Submitted date
2017-12-04
Closed date
2017-12-19
Last modified date
2017-12-19
APAR is sysrouted FROM one or more of the following:
APAR is sysrouted TO one or more of the following:
Modules/Macros
dsmrecal
Fix information
Fixed component name
TSM SPACE MGMT
Fixed component ID
5698HSMCL
Applicable component levels
R81L PSY
UP
R81A PSY
UP
[{"Line of Business":{"code":"LOB26","label":"Storage"},"Business Unit":{"code":"BU058","label":"IBM Infrastructure w\/TPS"},"Product":{"code":"SSSR2R","label":"Tivoli Storage Manager for Space Management"},"Platform":[{"code":"PF025","label":"Platform Independent"}],"Version":"81A"}]
Document Information
Modified date:
28 September 2021