IBM Support

IT34552: SERVER 8.1.10.XXX RUNNING NDMP BACKUPS IN PARALLEL ARE CAUSING RESOURCE WAITER ABORTS ANR0538I AND JUST ONE BACKUP IS FINISHING

Subscribe

You can track all active APARs for this component.

 

APAR status

  • Closed as program error.

Error description

  • IBM Spectrum Protect server can experience resource timeouts or
    hangs, on NAS session establishment, if there are multiple
    backups initiated for a single node.
    This affects 'backup node' run on the server
    and not 'backup nas' from client.
    
    In the activity log, will be seen
    "ANR0538I A resource waiter has been aborted".
    This is similar to APAR IT33380 but in a different code area.
    
    You are exposed by this APAR, if the SHOW RESQ output will show
    many waiters waiting for the same mutex with type 17001
    and a Waiter showing "Mode=xixLock" in "SHOW LOCKS
    ONLYW=Y" output.
    
    ie.
    "show resq"
    
    ==========================================================
    This is a local waiter.
    status=resWaiting waitTime=0 minutes
    waiter Type=unknown (0) timeout value=60 minutes
    txnSeqNo=0:12080242 resourceName=(NASNODE1) lengthLen=8
    type=17001 and nameSpace=0
    waiter thead id is 289290 condition=1009078432
     mutex=34877904 abortFunc=12b9140
    ==========================================================
    This is a local waiter.
    status=resWaiting waitTime=0 minutes
    waiter Type=unknown (0) timeout value=60 minutes
    txnSeqNo=0:12080241 resourceName=(NASNODE1) lengthLen=8
    type=17001 and nameSpace=0
    waiter thead id is 289275 condition=-1674817552
     mutex=34877904 abortFunc=12b9140
    ==========================================================
    
    The mutex is held by threads 289263, 289252, 284870 and 248677
    as shown in the "SHOW LOCKS
    ONLYW=Y" output and a Waiter will show xixLock.
    
    slot -> 17301:
    LockDesc: Type=17001(admin node name), NameSpace=0,
    SummMode=sLock, Key='NASNODE1'
      Holder: (admutil.c:12060 Thread 289263) Tsn=0:12041708,
    Mode=sLock
      Holder: (admutil.c:12060 Thread 289252) Tsn=0:12041639,
    Mode=sLock
      Holder: (admutil.c:12060 Thread 284870) Tsn=0:11860610,
    Mode=sLock
      Holder: (admutil.c:12060 Thread 248677) Tsn=0:10361695,
    Mode=sLock
      Waiter: (admutil.c:12060 Thread 289281) Tsn=0:12042129,
    Mode=xixLock   <==
      Waiter: (admutil.c:12060 Thread 289275) Tsn=0:12047317,
    Mode=sLock
      Waiter: (admutil.c:12060 Thread 289290) Tsn=0:12047400,
    Mode=sLock
      Waiter: (admutil.c:12060 Thread 289416) Tsn=0:12047700,
    Mode=sLock
    
    To confirm, this is from a 'backup node' running, issue:
    "SHOW TXNT LOCKD=N"
    and see:
    
    slot -> 236:
    Tsn=0:12041708, Resurrected=False, InFlight=True,
    Distributed=False, Persistent=True, Addr 0x7ffd8810e170
      Start ThreadId=289263, Timestamp=10/08/2020 01:03:43 AM,
    Creator=bfremote.c(430)
      Last known in use by ThreadId=289263
      Participants=3, summaryVote=ReadOnly
      EndInFlight False, endThreadId 0, tmidx 0, processBatchCount
    0, mustAbort False.
        Participant DB: voteReceived=False, ackReceived=False
          DB: Txn 0x7ffd882b7360, ReadOnly(YES),
    connP=0x7ffd88007160, applHandle=48751, openTbls=4:
          DB: --> OpenP=0x7ffcf8280470 for
    table=Adm.Security.Settings.
          DB: --> OpenP=0x7ffe3c39dff0 for table=Filespaces.
          DB: --> OpenP=0x7ffcf82b1490 for table=Nodes.
          DB: --> OpenP=0x7ffcb40dffb0 for table=SS.Pools.
        Participant BF: voteReceived=False, ackReceived=False
        Participant SS: voteReceived=False, ackReceived=False
      Locks held by Tsn=0:12041708 :
        Type=17001(admin node name), Number held: 1
    
    "show threads" confirms that this this parent thread 289263 is
    from "backup node" process:
    
    Thread 289266, Parent 289263: AfStoreNativeThread, Storage
    10384, AllocCnt 5540784 HighWaterAmt 287936
     tid=140733110085376, ptid=140725609031424, det=1, zomb=0,
    join=0, result=0, sess=0, procToken=1273, sessToken=581740
     lwp=31298
     Stack traces are disabled by option.
     Thread context:
       COMMMETHOD: SSL
       COMMAND: BACKUP NODE
       THREAD_TYPE: PROCESS
       PROCESS_DESC: BACKUP NAS (FULL)
       PROCESS_NUMBER: 1273
       SESSION_TYPE: ADMIN
       ADMIN_NAME: NASBACKUP
    
    
    
    Spectrum Protect Versions Affected:
    
    IBM Spectrum Protect Server 8.1.10.000 and higher on all
    supported platforms
    
    | MDVREGR 8.1.10-TIV_5698MSV |
    
    Additional Keywords: TS004261099  TSM backup node ndmp nas lock
    resourcetimeout ANR0538I IT33380
    

Local fix

  • Run "backup node" for a single node, one backup at a time.
    

Problem summary

  • ****************************************************************
    * USERS AFFECTED:                                              *
    * All IBM Spectrum Protect server users.                       *
    ****************************************************************
    * PROBLEM DESCRIPTION:                                         *
    * See error description.                                       *
    ****************************************************************
    * RECOMMENDATION:                                              *
    * Apply fixing level when available. This problem is currently *
    * projected to be fixed in levels 8.1.10.200 and 8.1.11.100.   *
    * Note that this is subject to change at the discretion of     *
    * IBM.                                                         *
    ****************************************************************
    

Problem conclusion

  • This problem was fixed.
    Affected platforms for reported release: AIX, Linux, and
    Windows.
    Platforms fixed: AIX, Linux, and Windows.
    

Temporary fix

Comments

APAR Information

  • APAR number

    IT34552

  • Reported component name

    TSM SERVER

  • Reported component ID

    5698ISMSV

  • Reported release

    81L

  • Status

    CLOSED PER

  • PE

    NoPE

  • HIPER

    NoHIPER

  • Special Attention

    NoSpecatt / Xsystem

  • Submitted date

    2020-10-15

  • Closed date

    2020-11-16

  • Last modified date

    2020-11-16

  • APAR is sysrouted FROM one or more of the following:

  • APAR is sysrouted TO one or more of the following:

Fix information

  • Fixed component name

    TSM SERVER

  • Fixed component ID

    5698ISMSV

Applicable component levels

  • R81A PSY

       UP

  • R81L PSY

       UP

  • R81W PSY

       UP

[{"Business Unit":{"code":"BU058","label":"IBM Infrastructure w\/TPS"},"Product":{"code":"SSGSG7","label":"Tivoli Storage Manager"},"Platform":[{"code":"PF025","label":"Platform Independent"}],"Version":"81L","Line of Business":{"code":"LOB26","label":"Storage"}}]

Document Information

Modified date:
18 November 2021