IBM Support

IT37138: POOR PERFORMANCE WRITING TO FILE CLASS POOLS RESIDING ON NFS FILESYSTEMS WITH MANY HUNDREDS OF SESSIONS

Subscribe

You can track all active APARs for this component.

 

APAR status

  • Closed as program error.

Error description

  • IBM Spectrum Protect server can present a performance
    degradation due to a mutex lock contention when more than
    hundreds of sessions write data to file class storage pools
    residing on NFS file systems.
    
    Diagnostics:
    High "Unknown" time in servermon instrumentation
    psSessionThreads similar to:
    
    Thread 12345678 psSessionThread parent=123
    04:37:35.562-->05:01:22.187
    Operation       Count  Tottime  Avgtime  Mintime  Maxtime
    InstTput  Total KB
    ---------------------------------------------------------------
    -------------
    Network Recv   993626  126.268    0.000    0.000    1.108
    41900.7   5290731
    Network Send      125    0.001    0.000    0.000    0.000
    3813.9         5
    SSL Receive       770    0.008    0.000    0.000    0.000
    1893.5        16
    SSL Send          250    0.007    0.000    0.000    0.000
    134.9         1
    DB2 Fetch Prep    252    0.055    0.000    0.000    0.004
    DB2 MFtch Prep     45    0.016    0.000    0.000    0.002
    DB2 Updat Prep    168    0.024    0.000    0.000    0.001
    DB2 Fetch Exec    505    0.093    0.000    0.000    0.009
    DB2 MFtch Exec     53    0.007    0.000    0.000    0.000
    DB2 Inser Exec    267    0.063    0.000    0.000    0.002
    DB2 Updat Exec    336    0.040    0.000    0.000    0.001
    DB2 Fetch         505    0.003    0.000    0.000    0.000
    DB2 MFetch     113414    9.233    0.000    0.000    0.008
    DB2 Commit         88    0.291    0.003    0.000    0.198
    DB2 Reg Prep       45    0.012    0.000    0.000    0.002
    DB2 Reg Exec       60    0.006    0.000    0.000    0.000
    DB2 Reg Fetch      50    0.000    0.000    0.000    0.000
    Tm Lock Wait      122  234.558    1.923    0.000   24.458
    Acquire Latch      68    3.273    0.048    0.000    0.774
    Acquire XLatch     42    0.000    0.000    0.000    0.000
    Thread Wait     20920   52.065    0.002    0.000    4.726
    Unknown               1000.592
    ---------------------------------------------------------------
    -------------
    Total                 1426.625
    3708.6   5290754
    
    If high unknown time is seen above and the stacks from the
    servermon "show" files show many stack entries in
    "pkAcquireMutexTracked" which are called by a function starting
    with "pvr" (the function just below the "pkAcqureMutexTracked")
    then this APAR may apply, examples:
    
    Thread 1291 (Thread 0x7f8d3af3c700 (LWP 6734)):
    #0  0x00007f94e24b254d in __lll_lock_wait () from
    /lib64/libpthread.so.0
    #1  0x00007f94e24ade9b in _L_lock_883 () from
    /lib64/libpthread.so.0
    #2  0x00007f94e24add68 in pthread_mutex_lock () from
    /lib64/libpthread.so.0
    #3  0x0000000001333455 in pkAcquireMutexTracked
    (mutexP=mutexP@entry=0x2fb6cf0, file=file@entry=0x18393f5
    "pvr.c", line=line@entry=10811) at pkmon.c:593
    #4  0x0000000000bff56f in pvrGetFSCap
    (devClass=devClass@entry=1, fsCapP=fsCapP@entry=0x7f8d3af31420)
    at pvr.c:10811
    #5  0x0000000001133a83 in AsGetFilePoolSpace
    (poolId=poolId@entry=-1, amount=amount@entry=134234456,
    capacityP=capacityP@entry=0x7f8d3af32518) at asutil.c:1113
    #6  0x0000000001184c2a in ssSWAlloc (sessHandle=0x7f903d0253c0,
    txnId=0x7f903cc02600, poolId=poolId@entry=-1,
    allocMode=allocMode@entry=ssNewAlloc, amount=134234456,
    amount@entry=134218008, inclVolList=inclVolList@entry=0x0,
    inclVolCount=0) at ssalloc.c:1457
    #7  0x0000000000855c9f in AfAllocSpace
    (txnP=txnP@entry=0x7f903c77da10, destPoolId=destPoolId@entry=-1,
    srvId=srvId@entry=0, ck1=ck1@entry=47, ck2=ck2@entry=4011,
    allocMode=allocMode@entry=ssNewAlloc, amount=134218008,
    multiDest=True) at afcreate.c:2344
    #8  0x0000000000859bc5 in AfSwAlloc
    (txnP=txnP@entry=0x7f903c77da10, poolId=-1, srvId=srvId@entry=0,
    bfId=bfId@entry=1878267322, ck1=ck1@entry=47,
    ck2=ck2@entry=4011, estSize=134218008, afTxnH=0x7f94c0e4d8a8,
    aggregated=False, copyFragInfo=0x0, pendingIdP=0x0) at
    afcreate.c:3133
    
    Thread 1195 (Thread 0x7f8de514f700 (LWP 36039)):
    #0  0x00007f94e24b254d in __lll_lock_wait () from
    /lib64/libpthread.so.0
    #1  0x00007f94e24ade9b in _L_lock_883 () from
    /lib64/libpthread.so.0
    #2  0x00007f94e24add68 in pthread_mutex_lock () from
    /lib64/libpthread.so.0
    #3  0x0000000001333455 in pkAcquireMutexTracked
    (mutexP=mutexP@entry=0x2fb6cf0, file=file@entry=0x16542b0
    "pvrclass.c", line=line@entry=6067) at pkmon.c:593
    #4  0x0000000000c0b470 in pvrIsFileDevClass
    (className=className@entry=0x7f8de5144450 "CBRPFS02") at
    pvrclass.c:6067
    #5  0x00000000011339ba in AsGetFilePoolSpace
    (poolId=poolId@entry=5, amount=amount@entry=134234454,
    capacityP=capacityP@entry=0x7f8de5145518) at asutil.c:1069
    #6  0x000000000118483e in ssSWAlloc (sessHandle=0x7f9278b3cbf0,
    txnId=0x7f92787a4480, poolId=poolId@entry=5,
    allocMode=allocMode@entry=ssNewAlloc, amount=134234454,
    amount@entry=134218006, inclVolList=inclVolList@entry=0x0,
    inclVolCount=0) at ssalloc.c:1393
    #7  0x0000000000855c9f in AfAllocSpace
    (txnP=txnP@entry=0x7f92786aeac0, destPoolId=destPoolId@entry=5,
    srvId=srvId@entry=0, ck1=ck1@entry=46, ck2=ck2@entry=1277,
    allocMode=allocMode@entry=ssNewAlloc, amount=134218006,
    multiDest=True) at afcreate.c:2344
    #8  0x0000000000859bc5 in AfSwAlloc
    (txnP=txnP@entry=0x7f92786aeac0, poolId=poolId@entry=5,
    srvId=srvId@entry=0, bfId=bfId@entry=1878267280,
    ck1=ck1@entry=46, ck2=ck2@entry=1277, estSize=134218006,
    afTxnH=0x7f92786aec00, aggregated=False, copyFragInfo=0x0,
    pendingIdP=0x0) at afcreate.c:3133
    #9  0x0000000000806d1f in StoreBitfile
    (txnP=txnP@entry=0x7f92786aeac0,
    poolName=poolName@entry=0x7f8de5145c00 "P_DATA_D", poolId=5,
    srvId=srvId@entry=0, bfId=bfId@entry=1878267280,
    estSize=134218006, ck1=46, ck2=1277, action=ssStoreEntire,
    mountWaitMode=bfWaitMount, sourceFunc=0x105fd70
    <SmRecvNextData>, contextP=0x7f9278cc7de0,
    storeInfoP=0x7f9278a9ad20, sessP=<optimized out>) at
    bfswcrt.c:2560
    #10 0x0000000000807f0d in BfSwCreate
    (sessHandle=sessHandle@entry=0x7f9278c47c80,
    txnId=0x7f92787a4480, bfId=1878267280,
    estSize=estSize@entry=134218006,
    estBitfileSize=estBitfileSize@entry=134218006, ck1=ck1@entry=46,
    ck2=1277, poolName=0x7f8de5145c00 "P_DATA_D",
    mountWaitMode=bfWaitMount, sourceFunc=0x105fd70
    <SmRecvNextData>, contextP=0x7f9278cc7de0,
    aggregateState=admAggregate, forceAggregation=False) at
    bfswcrt.c:729
    #11 0x0000000000773422 in bfCreate
    (sessHandle=sessHandle@entry=0x7f9278c47c80,
    txnId=txnId@entry=0x7f92787a4480, bfId=<optimized out>,
    bfId@entry=1878267280, estSize=estSize@entry=134218006,
    estBitfileSize=estBitfileSize@entry=134218006, ck1=ck1@entry=46,
    ck2=<optimized out>, poolName=<optimized out>,
    mountWaitMode=<optimized out>, sourceFunc=<optimized out>,
    contextP=<optimized out>, aggregateState=<optimized out>,
    forceAggregation=<optimized out>, isFragmentedP=<optimized out>,
    rptSizeP=<optimized out>) at bfcreate.c:2544
    #12 0x0000000000fb247b in CreateBitfile
    (sessP=sessP@entry=0x7f9278cc7de0,
    bfHandle=bfHandle@entry=0x7f9278c47c80,
    txnId=txnId@entry=0x7f92787a4480, bfId=1878267280, ck1=46,
    ck2=1277, objInfoP=0x7f92784aefa0 "\f\f\026", objInfoLen=22,
    poolName=0x7f8de5145c00 "P_DATA_D", estSize=134218006,
    estBitfileSize=134218006, mountWaitMode=2, objType=1 '\001',
    isFragmentedP=0x7f8de5145bc8, rptSizeP=0x7f8de5145bf8) at
    smnode.c:29696
    #13 0x0000000000fc861d in SmDoBackInsNormEnhanced
    (sessP=sessP@entry=0x7f9278cc7de0, bfHandle=0x7f9278c47c80,
    txnId=0x7f92787a4480, verbP=verbP@entry=0x7f92784aeea0,
    domainId=1, txnDate=..., updateList=0x7f9278b1d490,
    objectOpen=0x7f8de5145fac, setDeltaStoredP=0x7f8de5145fbc,
    objIdP=0x0) at smnode.c:18505
    #14 0x0000000000fda9e3 in SmNodeSession
    (sessP=sessP@entry=0x7f9278cc7de0,
    logSummaryP=logSummaryP@entry=0x7f8de514be98) at smnode.c:6395
    #15 0x0000000000f83055 in HandleNodeSession
    (sessP=sessP@entry=0x7f9278cc7de0,
    nodeInfoP=nodeInfoP@entry=0x7f8de514c450,
    bIsSchedSess=bIsSchedSess@entry=False) at smexec.c:7546
    #16 0x0000000000f8a216 in DoNodeGeneral
    (nodeInfoP=0x7f8de514c450, sessP=0x7f9278cc7de0) at
    smexec.c:7599
    #17 smExecuteSession (infoP=<optimized out>,
    beginFunc=beginFunc@entry=0x1346910 <tcpBegin>,
    sendFunc=sendFunc@entry=0x1346250 <tcpSend>,
    recvFunc=recvFunc@entry=0x13465d0 <tcpRecv>,
    flushFunc=flushFunc@entry=0x1345be0 <tcpFlush>,
    abortFunc=abortFunc@entry=0x1346a00 <tcpAbort>,
    abortForceFunc=<optimized out>, qmethodFunc=<optimized out>,
    qaddressFunc=<optimized out>, authFunc=<optimized out>,
    isNetworkMethod=<optimized out>, isLocalConnection=<optimized
    out>, qIsNodeThreadFunc=<optimized out>, isOobSession=<optimized
    out>) at smexec.c:4196
    #18 0x0000000001347d0c in psSessionThread (argP=<optimized out>)
    at tcpcomm.c:1753
    #19 0x00000000013317b0 in StartThread (startInfoP=<optimized
    out>) at pkthread.c:4039
    #20 0x00007f94e24abea5 in start_thread () from
    /lib64/libpthread.so.0
    #21 0x00007f94dddd096d in clone () from /lib64/libc.so.6
    
    IBM Spectrum Protect versions affected:
    IBM Spectrum Protect Server 7.1 and 8.1 on all supported
    platforms
    
    Initial Impact: Medium
    
    Additional Keywords: TS005421570, performance, NFS
    

Local fix

Problem summary

  • ****************************************************************
    * USERS AFFECTED:                                              *
    * All IBM Spectrum Protect server users using file device      *
    * class over network shares (NFS/CIFS).                        *
    ****************************************************************
    * PROBLEM DESCRIPTION:                                         *
    * See error description.                                       *
    ****************************************************************
    * RECOMMENDATION:                                              *
    * Apply fixing level when available. This problem is currently *
    * projected to be fixed in levels 8.1.12.200 and 8.1.13. Note  *
    * that this is subject to change at the discretion of IBM.     *
    ****************************************************************
    

Problem conclusion

  • This problem was fixed.
    Affected platforms for reported release:  AIX, Linux, and
    Windows.
    Platforms fixed:  AIX, Linux, and Windows.
    

Temporary fix

Comments

APAR Information

  • APAR number

    IT37138

  • Reported component name

    TSM SERVER

  • Reported component ID

    5698ISMSV

  • Reported release

    81L

  • Status

    CLOSED PER

  • PE

    NoPE

  • HIPER

    NoHIPER

  • Special Attention

    NoSpecatt / Xsystem

  • Submitted date

    2021-06-07

  • Closed date

    2021-10-01

  • Last modified date

    2021-10-01

  • APAR is sysrouted FROM one or more of the following:

  • APAR is sysrouted TO one or more of the following:

Fix information

  • Fixed component name

    TSM SERVER

  • Fixed component ID

    5698ISMSV

Applicable component levels

  • R81A PSY

       UP

[{"Business Unit":{"code":"BU058","label":"IBM Infrastructure w\/TPS"},"Product":{"code":"SSGSG7","label":"Tivoli Storage Manager"},"Platform":[{"code":"PF025","label":"Platform Independent"}],"Version":"81L","Line of Business":{"code":"LOB26","label":"Storage"}}]

Document Information

Modified date:
27 April 2022