Direct links to fixes
APAR status
Closed as program error.
Error description
The storage rule replication processing reports as almost complete but does not finish and appears to be hung. QUERY PROCCESS on the SOURCE server shows the figures for work completed by the replication STGRULE process don't change over time. Evidence this issue is encountered: SHOW THREADS output on the SOURCE server shows NrReplicationThread thread hang at waiting for ctlP->phaseCompleteCond: Thread 3333, Parent 2222: NrReplicationThread, Storage 27763056, AllocCnt 13857 HighWaterAmt 27791488 tid=139721881839360, ptid=139721920440064, det=1, zomb=0, join=0, result=0, sess=4444, procToken=50, sessToken=5555 lwp=3864663 Stack trace: 0x7f174051c45c /lib64/libpthread.so.0(pthread_cond_wait+0x1fc) 0x152d201 /opt/tivoli/tsm/server/bin/dsmserv(pkWaitConditionTracked+0x91) 0xb67b1e /opt/tivoli/tsm/server/bin/dsmserv(NrReplicationThread+0x4cfe) 0x152a813 /opt/tivoli/tsm/server/bin/dsmserv(StartThread+0x93) 0x7f17405161ca /lib64/libpthread.so.0(UNKNOWN) 0x7f173c867e73 /lib64/libc.so.6(clone+0x43) Awaiting cond ctlP->phaseCompleteCond (0x0x4544b20), using mutex ctlP->phaseCompleteMutex (0x0x7f14643da230), at nrmain.c(14075) A trace of SOURCE server (with trace classes REPL SDREPL SESSION) will have communication errors, and generates the ANR0483W message for current session terminated, and will make target session pair dead: 03:06:41.100 [133164][smrepl.c][7871][smReplRtrv]:Object 9890010 processed with rc 3014 (bfSize 288453632) 03:06:41.100 [133164][smrepl.c][7885][smReplRtrv]:Exit: rc 3014, objId 9890010 03:06:41.100 [133164][nrfs.c][3216][NrReplicateBatch]:sourceRc=3014, bytesTransferred=0 03:06:41.100 [133164][nrfs.c][7651][CheckRcForBatch]:Object 9890010 had confirmation denied or sink error 03:06:41.100 [133164][nrfs.c][2555][NrReplicateBatch]:Ending current txn 03:06:41.100 [133164][smrepl.c][7477][smReplEndBatchTxn]:Enter: handle 0x7f98083acc40, inRc 1031 03:06:41.100 [133164][smrepl.c][7504][smReplEndBatchTxn]:Sending ABORT for txn 03:06:41.100 [133164][smrepl.c][7512][smReplEndBatchTxn]:Sending EndTxnEnhanced verb 03:06:41.100 [133164][smrepl.c][7517][smReplEndBatchTxn]:Error -1 sending EndTxnEnhancedVerb 03:06:41.100 [133164][smrepl.c][7620][smReplEndBatchTxn]:Exit: rc 2339, batchCount 0 03:06:41.100 [133164][nrfs.c][2559][NrReplicateBatch]:Error 2339 from smReplEndTxn ... 03:06:41.129 [133164][smrepl.c][4163][smReplFreeSession]:Marking dead target session 758 to server LEVTSM03 source sess 11517. 03:06:41.129 [133164][smrepl.c][4194][smReplFreeSession]:Closing out session 11517 to server LEVTSM03 03:06:41.129 [133164][smexec.c][6026][smCloseSessionFasp]:Enter close FASP type FASP BIDI Session smSessP 0x7f98083aa3e0 for END state . 03:06:41.131 [133164][smserv.c][3508][TraceMsg]:ANR0483W Session 11517 for node LEVTSM03 (AIX) terminated - forced by administrator.~ NrReplicationThread is stopped at waiting for uncompleted phases: 03:18:49.535 [133117][nrmain.c][14055][WaitForPhaseCompletion]:Phase 1740037082017 is not completed 03:18:49.535 [133117][nrmain.c][14055][WaitForPhaseCompletion]:Phase 1740037082054 is not completed 03:18:49.535 [133117][nrmain.c][14055][WaitForPhaseCompletion]:Phase 1740037082071 is not completed 03:18:49.536 [133117][nrmain.c][14055][WaitForPhaseCompletion]:Phase 1740037082311 is not completed 03:18:49.536 [133117][nrmain.c][14068][WaitForPhaseCompletion]:Waiting for 4 incomplete phases to finish On the TARGET server side, in the trace (with trace classes REPL SDREPL SESSION) the ANR0479W is generated for the session terminated by server: 02:10:48.473 [21019][smrepl.c][2439][SmReplServerSession]:Received verb BackInsNormEnhanced (C5) on session 758 02:10:48.473 [21019][smrepl.c][19985][TestArchRetention]:Exit, rc 0 02:10:48.473 [21019][imrepl.c][7843][imProcReplBkObjInfo]:Entry 02:10:48.473 [21019][imrepl.c][7997][imProcReplBkObjInfo]:Exit, rc=0, *objIdP=0, groupId=0 02:10:48.521 [21019][smtrans.c][5701][SmRecvNextData]:Sending confirmation resp=YES 02:10:48.522 [21019][smrepl.c][20974][TranslateToGRC]:Exit: rc 0, *objRcP -1, confirmState NO 02:10:48.522 [21019][smrepl.c][3144][SmReplServerSession]:Error -1 from SM Insert Verb 02:10:48.523 [21019][smrepl.c][3154][SmReplServerSession]:Absorbing object for failed transaction 02:10:50.398 [21019][sdutil.c][3575][TraceMsg]:ANR0951I Session 758 for node NODENAME processed 452 files by using inline data deduplication or compression, or both. The number of original bytes was 5,474,068,432. Inline data deduplication reduced the data by 0 bytes and inline compression reduced the data by 0 bytes. ~ 02:10:50.399 [21019][nrmain.c][7463][NrIsReplRecoveryProcessCancelled]:Exit: rc 0, procId 7 02:10:50.399 [21019][smrepl.c][3352][SmReplServerSession]:Replication session 758 for SPSERVER on Linux/x86_64 failed, rc 9997~ 02:10:50.399 [21031][smrepl.c][2439][SmReplServerSession]:Received verb BackInsNormEnhanced (C5) on session 770 02:10:50.417 [21019][smexec.c][4920][TraceMsg]:ANR0479W Session 758 for server SPSERVER (Linux/x86_64) terminated - connection with server severed.~ 02:10:50.418 [21019][imrepl.c][8146][imFreeReplBatchInfo]:freeing igRow=12d11f5f0 From the server trace GetUnsatisfiedDeps has phases waiting for an uncompleted phase as the dependency: 02:22:18.151 [20982][sdrepl.c][4400][GetUnsatisfiedDeps]:PhaseI d=1740037082071has unsatisfied dependency 1740037082017 02:22:18.151 [20982][sdrepl.c][4400][GetUnsatisfiedDeps]:PhaseI d=1740037082054has unsatisfied dependency 1740037082017 02:22:18.151 [20982][sdrepl.c][4400][GetUnsatisfiedDeps]:PhaseI d=1740037082311has unsatisfied dependency 1740037082017 02:22:22.453 [20982][sdrepl.c][4400][GetUnsatisfiedDeps]:PhaseI d=1740037082071has unsatisfied dependency 1740037082017 02:22:22.453 [20982][sdrepl.c][4400][GetUnsatisfiedDeps]:PhaseI d=1740037082054has unsatisfied dependency 1740037082017 02:22:22.453 [20982][sdrepl.c][4400][GetUnsatisfiedDeps]:PhaseI d=1740037082311has unsatisfied dependency 1740037082017 In the SHOW THREADS output of TARGET server, the SdReplTcrPhaseCheckThread is hung waiting on condition ctlP->phaseCheckCond: Thread 20982, Parent 20959: SdReplTcrPhaseCheckThread, Storage 18640, AllocCnt 17272 HighWaterAmt 125744 tid=d4f6, ptid=2fdf, det=0, zomb=0, join=0, result=0, sess=0, procToken=7, sessToken=729 Stack trace: 0x090000000052d9a0 _cond_wait_global 0x090000000052e69c _cond_wait 0x090000000052f00c pthread_cond_wait 0x000000010000b794 pkWaitConditionTracked 0x0000000100cf1cb4 SdReplTcrPhaseCheckThread 0x0000000100011a70 StartThread Awaiting cond ctlP->phaseCheckCond (0x116fa9cd0), using mutex ctlP->mutex (0x113b5b290), at sdrepl.c(2114) Additional Keywords: TS014556040 Spectrum TSM ANR0483W ANR0479W
Local fix
Problem summary
**************************************************************** * USERS AFFECTED: * * All IBM Storage Protect server users. * **************************************************************** * PROBLEM DESCRIPTION: * * See error description. * **************************************************************** * RECOMMENDATION: * * Apply fixing level when available. * * This problem is currently projected to be fixed in level * * 8.1.22. * * Note that this is subject to change at the discretion of * * IBM. * ****************************************************************
Problem conclusion
This problem was fixed. Affected platforms for reported release: AIX, Linux, and Windows. Platforms fixed: AIX, Linux, and Windows.
Temporary fix
Comments
APAR Information
APAR number
IT45126
Reported component name
TSM SERVER
Reported component ID
5698ISMSV
Reported release
81A
Status
CLOSED PER
PE
NoPE
HIPER
NoHIPER
Special Attention
NoSpecatt / Xsystem
Submitted date
2023-12-12
Closed date
2024-03-05
Last modified date
2024-03-05
APAR is sysrouted FROM one or more of the following:
APAR is sysrouted TO one or more of the following:
Fix information
Fixed component name
TSM SERVER
Fixed component ID
5698ISMSV
Applicable component levels
[{"Business Unit":{"code":"BU029","label":"Software"},"Product":{"code":"SSGSG7","label":"Tivoli Storage Manager"},"Platform":[{"code":"PF025","label":"Platform Independent"}],"Version":"81A"}]
Document Information
Modified date:
02 April 2024