Direct links to fixes
APAR status
Closed as program error.
Error description
The MonReplCliThread is holding an excessive number of node locks which could lead to locking issues. One of the potential problems encountered is a lot of client sessions stagnate in ?start? state during a long time and scheduled backups being missed. In the example below, a servermon log collection shows that all the "start" sessions are waiting for a xlock for a resource already held by thread 322: - In SHOW LOCKS output, session thread 89845 is waiting on lock held by thread 322: LockDesc: Type=17001(admin node name), NameSpace=0, SummMode=sLock, Key='NODEXX' Holder: (admutil.c:12676 Thread 322) Tsn=0:319230600, Mode=sLock Waiter: (admutil.c:12676 Thread 89845) Tsn=0:319231969, Mode=xLock - The thread 322 is the MonReplCliThread, SHOW THREAD output shows : Thread 322, Parent 47: MonReplCliThread, Storage 4599184, AllocCnt 315037390 HighWaterAmt 7565984 tid=f642, ptid=a2f, det=1, zomb=0, join=0, result=0, sess=0, procToken=0, sessToken=0 Stack trace: 0x09000000005d6940 _cond_wait_global 0x09000000005d763c _cond_wait 0x09000000005d7fac pthread_cond_wait 0x000000010000b2b4 pkWaitConditionTracked 0x000000010012a410 IPRA.$WaitForLock 0x0000000100128644 tmLockTracked 0x00000001003e8228 ImLockFileSpaceTracked 0x00000001003b7314 ImGetAllFsAttrsEx 0x00000001003b2098 imIsFSVMEx 0x00000001004b2ec8 IPRA.$LogReplCliInfoForStgRule 0x00000001004b12e0 scMonitorReplClients 0x000000010016c104 StatusMonitorGridsThread 0x0000000100011470 StartThread Awaiting cond waitP->waiting (0x1bab9c680), using mutex TMV->mutex (0x1116f1708), at tmlock.c(2539) - If the MonReplCliThread become stuck waiting on another lock held by another slow or long running transaction, it causes all the client sessions to stagnate in start state. - That problem affects only client nodes with replication enabled on source replication server. IBM Spectrum Protect Versions Affected: Version 8.1.13.0 on all supported platforms | MDVREGR 8.1.13.0-TIV_5698MSV | Additional Keywords: TS008549468 hung freeze deadlock
Local fix
Restart the server to resolve the lock conflict
Problem summary
**************************************************************** * USERS AFFECTED: * * All 8.1.13 users with big number of VM backups using OC. * **************************************************************** * PROBLEM DESCRIPTION: * * The function that builds Client Replication grid uses single * * transaction to do all work. In big environments with lots of * * VM backups this causes lock conflicts with file spaces * * table. * **************************************************************** * RECOMMENDATION: * * Apply fixing level when available. This problem is currently * * projected to be fixed in level 8.1.13.005, 8.1.13.100, * * 8.1.14.200, and 8.1.15.000. * * Note that this is subject to change at the discretion of * * IBM. * ****************************************************************
Problem conclusion
Problem was fixed. Affected platforms: AIX, Linux, and Windows.
Temporary fix
Comments
APAR Information
APAR number
IT40338
Reported component name
TSM SERVER
Reported component ID
5698ISMSV
Reported release
81A
Status
CLOSED PER
PE
NoPE
HIPER
NoHIPER
Special Attention
NoSpecatt / Xsystem
Submitted date
2022-03-21
Closed date
2022-04-20
Last modified date
2022-05-03
APAR is sysrouted FROM one or more of the following:
APAR is sysrouted TO one or more of the following:
Fix information
Fixed component name
TSM SERVER
Fixed component ID
5698ISMSV
Applicable component levels
[{"Business Unit":{"code":"BU058","label":"IBM Infrastructure w\/TPS"},"Product":{"code":"SSGSG7","label":"Tivoli Storage Manager"},"Platform":[{"code":"PF025","label":"Platform Independent"}],"Version":"81A","Line of Business":{"code":"LOB26","label":"Storage"}}]
Document Information
Modified date:
06 September 2022