IBM Support

IT25075: MOVE CONTAINER USING DEFRAG=YES AT SERVER LEVEL 8.1.5 HANGS PROCESSING FOR NON-DEDUP CONTAINERS

Subscribe

You can track all active APARs for this component.

 

APAR status

  • Closed as program error.

Error description

  • Under unique internal circumstances associated with extent
    reference counts, the MOVE CONTAINER processing of non-dedup
    containers with defrag=yes on an 8.1.5 IBM Spectrum Protect
    server may hang the move process.
    Servermon script output collected while the move container
    processing is hung will show that the DB2 application handles
    for the server parent and child threads appear to be deadlocked:
    Query process shows process (2) not making progress for a
    non-dedup (,ncf) container:
    Process  Process Description   Process Status
     Number
    -------  --------------------  ----------------------------
          2  Move Container        Cancelling Move Container of
                                   tsmstg75/43/0000000000004346.ncf.
                                   Elapsed time: 2 Days, 0 Hours, 0
                                   Minutes
    Lowest level child thread for hung move container process (2) is
    waiting in DB2 processing:
            Thread 863, Parent 862: SdCntrStreamThread, Storage
    1109749, AllocCnt 13614 HighWaterAmt 1713087
             tid=645f, ptid=635e, det=1, zomb=0, join=0, result=0,
    sess=0, procToken=2, sessToken=248
              Stack trace:
                0x09000000001f6690 semop
                0x090000000f5edee8 sqloSSemP
                0x090000000f79c9a8
    sqlccipcrecv__FP17SQLCC_COMHANDLE_TP12SQLCC_COND_T
                0x090000000f5e7918 sqlccrecv
                0x090000000f65f8bc sqljcReceive__FP10sqljCmnMgr
                0x090000000fc159bc
    sqljrDrdaArExecute__FP14db2UCinterfaceP9UCstpInfo
                0x090000000f8fa7cc
    CLI_sqlExecute__FP17CLI_STATEMENTINFOP19CLI_ERRORHEADERINFO
                0x090000000f97e2f0
    SQLExecute2__FP17CLI_STATEMENTINFOP19CLI_ERRORHEADERINFO
                0x090000000f978bc8 SQLExecute
                0x00000001001a4d14 IPRA.$RdbPrepareAndExecuteStmt
                0x00000001001a0a44 IPRA.$RdbCliUpdate
                0x00000001001a03a8 tbCliSRUpd
                0x00000001008fe0ec SdUpdateChunkLocationForDefrag
                0x00000001008ed194 IPRA.$ProcessDefragData
                0x00000001008ef100 IPRA.$ProcessStreamBuffer
                0x00000001008ea1ec SdCntrStreamThread
                0x000000010000ea80 StartThread
              Holding mutex txnP->mutex (0x111dde578), acquired at
    tbcli.c(1330)
             Thread context:
               COMMAND: MOVE CONTAINER
               COMMMETHOD: SSL
               PROCESS_NUMBER: 2
               PROCESS_DESC: Move Container
               THREAD_TYPE: PROCESS
               SESSION_TYPE: ADMIN
               ADMIN_NAME: TSMUTIL
    Transactions for above thread (note applHandle 2143)
    Tsn=0:54512, Resurrected=False, InFlight=True,
    Distributed=False, Persistent=True, Addr 120929e58
      Start ThreadId=863, Timestamp=04/17/18 14:24:11,
    Creator=sdstream.c(2336)
      Last known in use by ThreadId=863
      Participants=2, summaryVote=ReadOnly
      EndInFlight False, endThreadId 0, tmidx 0, processBatchCount
    0, mustAbort False.
        Participant DB: voteReceived=False, ackReceived=False
          DB: Txn 114889938, ReadOnly(YES), connP=1148896d8,
    applHandle=2143, openTbls=3:
          DB: --> OpenP=12679af58 for table=SD.Containers.
          DB: --> OpenP=11297dc98 for table=SS.Classes.
          DB: --> OpenP=11297fa78 for table=SS.Pools.
        Participant SD: voteReceived=False, ackReceived=False
    Tsn=0:54513, Resurrected=False, InFlight=True,
    Distributed=False, Persistent=True, Addr 11ac3e7f8
      Start ThreadId=863, Timestamp=04/17/18 14:24:11,
    Creator=sdstream.c(1784)
      Last known in use by ThreadId=863
      Participants=1, summaryVote=ReadOnly
      EndInFlight False, endThreadId 0, tmidx 0, processBatchCount
    0, mustAbort False.
        Participant DB: voteReceived=False, ackReceived=False
          DB: in-flight Txn 111dde3d8, skipped its detail.
    Transaction thread (862) for parent thread to 863 (note
    applHandle 2145)
    Tsn=0:54300, Resurrected=False, InFlight=True,
    Distributed=False, Persistent=True, Addr 11ad305b8
      Start ThreadId=862, Timestamp=04/17/18 14:24:07,
    Creator=sddefrag.c(5458)
      Last known in use by ThreadId=862
      Participants=2, summaryVote=ReadOnly
      EndInFlight False, endThreadId 0, tmidx 0, processBatchCount
    0, mustAbort False.
        Participant DB: voteReceived=False, ackReceived=False
          DB: Txn 1159e0818, ReadOnly(YES), connP=1159e05b8,
    applHandle=2145, openTbls=3:
          DB: --> OpenP=12195d838 for table=SD.Dedup.Audit.
          DB: --> OpenP=1159e5258 for table=SD.Containers.
          DB: --> OpenP=1159e09b8 for table=SD.Non.Dedup.Locations.
          DB: --> RegSqlId=0x0F0000E7 Unknown for table=Unknown,
    executed(Yes).
        Participant SD: voteReceived=False, ackReceived=False
           Thread 862, Parent 861: MoveContainerThread, Storage
    43788966, AllocCnt 13396 HighWaterAmt 43788966
           tid=635e, ptid=625d, det=0, zomb=0, join=1, result=0,
    sess=0, procToken=2, sessToken=248
            Stack trace:
              0x0900000000588260 _cond_wait_global
              0x0900000000588df8 _cond_wait
              0x0900000000589ae0 pthread_cond_wait
              0x0000000100009cb4 pkWaitConditionTracked
              0x00000001002d1c54 WaitEmptyVarQueue
              0x00000001008ee17c SdSignalStreamBufferAndWait
              0x00000001008f794c IPRA.$SdMoveContainer
              0x00000001008f5b88 MoveContainerThread
              0x000000010000ea80 StartThread
            Awaiting cond newQueue->isEmpty (0x11c1be600), using
    mutex newQueue->mutex (0x114b4d0d8), at queue.c(1743)
           Thread context:
             COMMMETHOD: SSL
             COMMAND: MOVE CONTAINER
             THREAD_TYPE: PROCESS
             PROCESS_DESC: Move Container
             PROCESS_NUMBER: 2
             SESSION_TYPE: ADMIN
             ADMIN_NAME: TSMUTIL
    From the last servermon db2.txt file we find the following DB2
    update waiting for the above lowest level child thread (863)
    with an application handle (2146).
    173426 EXEC  UPDATE "TSMDB1"."SD_NON_DEDUP_LOCATIONS" SET
    CNTRID=?, OFFSET=? WHERE (CHUNKID=?) --863       2146
    From the last db2pd.txt file we see the following locks being
    waited on for the application handles in question:
    Locks being waited on :
     AppHandl [nod-index] TranHdl    Lockname                   Type
    Mode Conv Sts CoorEDU    AppName  AuthID   AppID
     2145     [000-02145] 184        001A000400000001D656008652
    RowLock    ..X       G   79331      dsmserv  DR164
    *LOCAL.dr164.180417192418
     2146     [000-02146] 182        001A000400000001D656008652
    RowLock    ..X       W   78817      dsmserv  DR164
    *LOCAL.dr164.180417192419
    Tivoli Storage Manager Versions Affected: 8.1.5 on all platforms
    Customer/L2 Diagnostics (If Applicable)
    Initial Impact: Medium
    Additional Keywords: TSM defragment deduplication
    | MDVREGR 8.1.5.0 |
    

Local fix

Problem summary

  • ****************************************************************
    * USERS AFFECTED:                                              *
    * All IBM Spectrum Protect server users.                       *
    ****************************************************************
    * PROBLEM DESCRIPTION:                                         *
    * See error description.                                       *
    ****************************************************************
    * RECOMMENDATION:                                              *
    * Apply fixing level when available. This problem is currently *
    * projected to be fixed in level 8.1.6. Note that this is      *
    * subject to change at the discretion of IBM.                  *
    ****************************************************************
    

Problem conclusion

  • This problem was fixed.
    Affected platforms for reported release:  AIX, Linux, and
    Windows.
    Platforms fixed:  AIX, Linux, and Windows.
    

Temporary fix

Comments

APAR Information

  • APAR number

    IT25075

  • Reported component name

    TSM SERVER

  • Reported component ID

    5698ISMSV

  • Reported release

    81A

  • Status

    CLOSED PER

  • PE

    NoPE

  • HIPER

    NoHIPER

  • Special Attention

    NoSpecatt / Xsystem

  • Submitted date

    2018-05-16

  • Closed date

    2018-06-19

  • Last modified date

    2018-06-19

  • APAR is sysrouted FROM one or more of the following:

  • APAR is sysrouted TO one or more of the following:

Fix information

  • Fixed component name

    TSM SERVER

  • Fixed component ID

    5698ISMSV

Applicable component levels

[{"Business Unit":{"code":"BU029","label":"Software"},"Product":{"code":"SSGSG7","label":"Tivoli Storage Manager"},"Platform":[{"code":"PF025","label":"Platform Independent"}],"Version":"81A"}]

Document Information

Modified date:
11 September 2024