APAR status
Closed as program error.
Error description
This problem is a timing problem that can happen when multiple node replication threads are processing objects belonging to the same peer group on the source server. The hang happens on the target server. This is more likely to happen when performing node replication for Windows 2008 clients with backups of systemstate objects. There are other possibilities for peer groups but the Windows 2008 clients with systemstate backup objects are the largest of these groups at this time. Customer/L2 Diagnostics (if applicable) The thread that is hanging in DB2 will have the ImReplFindOrAddGroup function: Thread 84693, Parent 54: psSessionThread, Storage 11416396, AllocCnt 601411 HighWaterAmt 11617472 tid=102d5, ptid=2136, det=1, zomb=0, join=0, result=0, sess=27441 Holding mutex txnP->mutex (0x117794118), acquired at tbcli.c(1248) Stack trace: 0x0900000000262510 semop 0x0900000000e92640 sqloSSemP 0x0900000000e92084 .sqlccipcrecv.fdpr.clone.756__FP17SQLCC_COMHANDLE_TP12SQLCC_COND _T 0x0900000000e92fe4 .sqlccrecv.fdpr.clone.125 0x0900000000e92ce8 sqljcReceive__FP10sqljCmnMgr 0x0900000000e9d8e0 sqljrDrdaArExecute__FP14db2UCinterfaceP9UCstpInfo 0x090000000127c008 CLI_sqlExecute__FP17CLI_STATEMENTINFOP19CLI_ERRORHEADERINFO 0x0900000001285450 SQLExecute2__FP17CLI_STATEMENTINFOP19CLI_ERRORHEADERINFO 0x09000000010be1b0 SQLExecute 0x0000000100133564 RdbPrepareAndExecuteStmt 0x000000010012f250 RdbCliUpdate 0x000000010012ee64 tbCliSRUpd 0x00000001005d65c8 ImReplFindOrAddGroup <--- look for this 0x00000001005d5a50 imProcReplBkObjInfo 0x000000010059a95c SmDoBackInsNormEnhanced 0x000000010060b40c SmReplServerSession 0x0000000100192f74 DoReplServer 0x000000010018a8b8 smExecuteSession 0x0000000100057008 psSessionThread 0x0000000100020760 StartThread The SHOW TXNT outputs for this thread: Tsn=0:31570620, Resurrected=False, InFlight=True, Distributed=False, Persistent=True, Addr 11dbebdf8 Start ThreadId=84693, Timestamp=02/18/12 09:35:32, Creator=smrepl.c(7248) Last known in use by ThreadId=84693 Participants=3, summaryVote=ReadOnly EndInFlight False, endThreadId 0, tmidx 0 0, processBatchCount 0, mustAbort False. Participant DB: voteReceived=False, ackReceived=False DB: in-flight Txn 117797af8, skipped its detail. Participant IM: voteReceived=False, ackReceived=False Participant BF: voteReceived=False, ackReceived=False This transaction for the thread is holding the lock 19006. Tsn=0:31585228, Resurrected=False, InFlight=True, Distributed=False, Persistent=True, Addr 116718118 Start ThreadId=84693, Timestamp=02/18/12 09:35:51, Creator=imrepl.c(6028) Last known in use by ThreadId=84693 Participants=1, summaryVote=ReadOnly EndInFlight False, endThreadId 0, tmidx 0 0, processBatchCount 0, mustAbort False. Participant DB: voteReceived=False, ackReceived=False DB: Txn 117792758, ReadOnly(YES), connP=1177a1458, applHandle=7537, openTbls=5: DB: --> OpenP=1196d54b8 for table=InFlight.ReplGroups. DB: --> OpenP=129aa5818 for table=Extended.Attributes. DB: --> OpenP=1221bbd98 for table=Nodes. DB: --> OpenP=11ce99658 for table=Policy.Domain.Members. DB: --> OpenP=116d04858 for table=Server.Connect. Locks held by Tsn=0:31585228 : Type=19006(im repl group), NameSpace=0, SummMode=xLock, Mode=xLock, Key='162862599.2' The application info on the DB2 side where this is hanging. This is waiting on an uncommitted read which means this row was updated in another transaction.: Application : Address : 0x0780000005A40080 AppHandl [nod-index] : 7544 [000-07544] TranHdl : 216 Application PID : 4784304 Application Node Name : tsm02fm IP Address: n/a Connection Start Time : (1329571538)Sat Feb 18 08:25:38 2012 Client User ID : tsminst1 System Auth ID : TSMINST1 Coordinator EDU ID : 68210 Coordinator Partition : 0 Number of Agents : 1 Locks timeout value : NotSet Locks Escalation : No Workload ID : 1 Workload Occurrence ID : 126868 Trusted Context : n/a Connection Trust Type : non trusted Role Inherited : n/a Application Status : Lock-wait Application Name : dsmserv Application ID : *LOCAL.tsminst1.120218133227 ClientUserID : n/a ClientWrkstnName : n/a ClientApplName : n/a ClientAccntng : n/a CollectActData: N CollectActPartition: C SectionActuals: N List of active statements : *UOW-ID : 21 Activity ID : 17 Package Schema : NULLID Package Name : SYSSN100 Package Version : Section Number : 17 SQL Type : Dynamic Isolation : UR Statement Type : DML, Insert/Update/Delete Statement : UPDATE "TSMDB1"."INFLIGHT_REPLGROUPS" SET FLAGS=? WHERE (SRC_GROUPID=? AND GROUPTYPE=?) --84693 The DB2 lock information for this lock: LRB State Status Mode Dur CMode CDur Flags TranHandle HoldCount lsoFeedback CursorBitmap AppHandle rrIID ------------------ ----- ------ ---- --- ----- ---- ------ ---------- --------- ------------ ---------- --------- ----- a000602617ed300 L G ..X 1 NON 0 0000 81 0 784489098843 40000000 0-7524 0000 a0006004b2e7100 L W ..X 1 NON 0 0000 216 0 0 40000000 0-7544 0000 The thread in the SHOW THREADS output for the application handle that holds this lock: Tsn=0:31579200, Resurrected=False, InFlight=True, Distributed=False, Persistent=True, Addr 11c8cff78 Start ThreadId=84681, Timestamp=02/18/12 09:35:44, Creator=smrepl.c(7248) Last known in use by ThreadId=84681 Participants=4, summaryVote=ReadOnly EndInFlight False, endThreadId 0, tmidx 0 0, processBatchCount 0, mustAbort False. Participant DB: voteReceived=False, ackReceived=False DB: Txn 116860fb8, ReadOnly(NO), connP=117758c98, applHandle=7524, openTbls=13: DB: --> OpenP=11b099298 for table=Restore.Sessions. DB: --> OpenP=11f85c698 for table=Group.Leaders. DB: --> OpenP=11c621ed8 for table=InFlight.ReplGroups. DB: --> OpenP=116d30878 for table=Replicated.Objects.Reverse. DB: --> OpenP=1205fccb8 for table=Filespaces. DB: --> OpenP=1205fbb18 for table=Backup.Objects. DB: --> OpenP=123768df8 for table=Archive.Objects. DB: --> OpenP=123768a58 for table=Replicated.Objects. DB: --> OpenP=12a16ee58 for table=Extended.Attributes. DB: --> OpenP=11ffbbaf8 for table=Nodes. DB: --> OpenP=117d5a498 for table=Replicating.Servers. DB: --> OpenP=117c59938 for table=Server.Connect. DB: --> OpenP=116deef18 for table=Server.Connect.Info. Participant IM: voteReceived=False, ackReceived=False Participant BF: voteReceived=False, ackReceived=False Participant SS: voteReceived=False, ackReceived=False Locks held by Tsn=0:31579200 : Type=46001(bf aggregate (superbitfile) id), NameSpace=0, SummMode=xLock, Mode=xLock, Key='209578738' This is the call stack for thread 84681: Thread 84681, Parent 54: psSessionThread, Storage 7774573, AllocCnt 772586 HighWaterAmt 11703935 tid=f7c9, ptid=2136, det=1, zomb=0, join=0, result=0, sess=27429 Awaiting cond waitP->waiting (0x110bca820), using mutex TMV->mutex (0x110c6cd38), at tmlock.c(749) Stack trace: 0x09000000004bba60 _cond_wait_global 0x09000000004bc5f8 _cond_wait 0x09000000004bd2e0 pthread_cond_wait 0x0000000100007644 pkWaitConditionTracked 0x00000001000bca9c tmLockTracked 0x00000001005d5f1c ImReplFindOrAddGroup 0x00000001005d5a50 imProcReplBkObjInfo 0x000000010059a95c SmDoBackInsNormEnhanced 0x000000010060b40c SmReplServerSession 0x0000000100192f74 DoReplServer 0x000000010018a8b8 smExecuteSession 0x0000000100057008 psSessionThread 0x0000000100020760 StartThread Platforms affected: TSM 6.3 Unix Linux Windows Initial Impact: Medium Additional Keywords: hung ZZ63 replicate nodegroup
Local fix
Problem summary
**************************************************************** * USERS AFFECTED: All Tivoli Storage Manager server users. * **************************************************************** * PROBLEM DESCRIPTION: See error description. * **************************************************************** * RECOMMENDATION: Apply fixing level when available. This * * problem is currently projected to be fixed * * in level 6.3.2. Note that this is * * subject to change at the discretion of IBM. * **************************************************************** *
Problem conclusion
This problem was fixed. Affected platforms: AIX, HP-UX, Solaris, Linux, and Windows.
Temporary fix
Comments
APAR Information
APAR number
IC81596
Reported component name
TSM SERVER
Reported component ID
5698ISMSV
Reported release
63A
Status
CLOSED PER
PE
NoPE
HIPER
NoHIPER
Special Attention
NoSpecatt
Submitted date
2012-02-22
Closed date
2012-04-11
Last modified date
2012-04-11
APAR is sysrouted FROM one or more of the following:
APAR is sysrouted TO one or more of the following:
Fix information
Fixed component name
TSM SERVER
Fixed component ID
5698ISMSV
Applicable component levels
R63A PSY
UP
R63H PSY
UP
R63L PSY
UP
R63S PSY
UP
R63W PSY
UP
[{"Business Unit":{"code":"BU058","label":"IBM Infrastructure w\/TPS"},"Product":{"code":"SSGSG7","label":"Tivoli Storage Manager"},"Component":"","ARM Category":[],"Platform":[{"code":"PF025","label":"Platform Independent"}],"Version":"63A","Edition":"","Line of Business":{"code":"LOB26","label":"Storage"}}]
Document Information
Modified date:
11 April 2012