IBM Support

Debugging IFS contention on the IBM i

Troubleshooting


Problem

This document describes a few common areas for IFS contention on the IBM i

Symptom

In cases where it has been determined that there is seize conflict contention related to "Adaptable Smart chain" for a job or group of jobs,  the usual cause is that of IFS path contention.
This screenshot shows significant seize contention.  Seize contention is not unique to smart chains, but smart chain contention will show up as seize contention.
image-20230502094640-2
The cause could be a single job or several jobs that process the same IFS path and its stream file objects.  Generally, there is good probability that this path contains many directories and objects in that path.
This document is not exhaustive in causes, but provides some common areas where contention can exist.
Examples
Java virtual machine (JVM) jobs.  The primary environment of these jobs is in the IFS.  The java classes, log files and data most likely are in the IFS. 
IFS tool jobs.  These jobs scan the IFS may even be called "SCANIFS".  It generally will capture object and directory attributes.  These may or may not be JVMs.
Jobs that might create reports in the IFS.   Perhaps these directories are .xml reports and grow over time and create more and more directories.  The code may have logic in it that examines the objects in the IFS before creating the file.
Data Collection
 

In the case where there is a system-wide contention (MTXW, SMARTCHAIN) stemming from what are JVMs, there are two basic areas to review.
1) Contention on /tmp/.com_ibm_tools_attach directory
For JVMs, this is one of the most common causes of smart chain contention.  This directory is used for the "java attach API"
https://www.ibm.com/docs/en/sdk-java-technology/8?topic=documentation-java-attach-api
https://www.ibm.com/support/pages/disable-creation-files-within-tmpcomibmtoolsattach-0
A Job Watcher (JW) collection that does not include *JAVASTACK events may show this stack
 
qp0f_vn_gen_inactive__FP5vnode
qp0l_lookupv__FP13qp0l_pathnameiP14qp0l_nameidata
qp0lis_stat__FPiT1P13qp0l_pathnameP6stat64i
qp0lis_stat__FPiT1P13qp0l_pathnameP6stat64i
cblabranch
aimach_upcall_portal
pxsyscallslic
statx_common__FPCcPvtT3iP16ILEHeapAllocator
user_statx__FUlN31i
do_syscall_pdc_trace__FP12TiaSaveState
tia_schandler
<syscall64>
stat64
omrfile_stat
Java_openj9_internal_tools_attach_target_FileLock_lockFileImpl
If a javacore is taken (similar to a DSPJOB, but for JVMs), the above stack would show the Java code part of it.  Similar to this:
    
   "Attach API wait loop" J9VMThread:0x00000000C0199100, j9thread_t:0x0000000181CAE9C0, java/lang/Thread:0x0000000040D50E68, state:R, prio=10
             (java/lang/Thread getId:0x46, isDaemon:true)
             (native thread ID:0x2820009, native priority:0xA, native policy:UNKNOWN, vmstate:CW, vm thread flags:0x00000001)
             CPU usage total: 13.137225000 secs, user: 0.133572000 secs, system: 13.003653000 secs
              Heap bytes allocated since last GC cycle=472688 (0x73670)
            Java callstack:
                 at com/ibm/tools/attach/javaSE/FileLock.lockFileImpl(Native Method)
                at com/ibm/tools/attach/javaSE/FileLock.lockFile(FileLock.java:59)
                 at com/ibm/tools/attach/javaSE/AttachHandler$WaitLoop.checkReplyAndCreateAttachment(AttachHandler.java:419)
                 at com/ibm/tools/attach/javaSE/AttachHandler$WaitLoop.waitForNotification(AttachHandler.java:406)
                 at com/ibm/tools/attach/javaSE/AttachHandler$WaitLoop.run(AttachHandler.java:440)

or

      "Attach API wait loop" J9VMThread:0x00000000C01AFB00, j9thread_t:0x00000001834A1008, java/lang/Thread:0x000000004006AB60, state:R, prio=10
            (java/lang/Thread getId:0x47, isDaemon:true)
            (native thread ID:0x66011D, native priority:0xA, native policy:UNKNOWN, vmstate:CW, vm thread flags:0x00000001)
             CPU usage total: 12.969130000 secs, user: 0.176551000 secs, system: 12.792579000 secs
            Heap bytes allocated since last GC cycle=42864 (0xA770)
           Java callstack:
                at java/io/UnixFileSystem.getBooleanAttributes0(Native Method)
                at java/io/UnixFileSystem.getBooleanAttributes(UnixFileSystem.java:254(Compiled Code))
                at java/io/File.exists(File.java:825(Compiled Code))
                at com/ibm/tools/attach/javaSE/TargetDirectory.ensureMyAdvertisementExists(TargetDirectory.java:149)
                at com/ibm/tools/attach/javaSE/AttachHandler$WaitLoop.checkReplyAndCreateAttachment(AttachHandler.java:412)
                at com/ibm/tools/attach/javaSE/AttachHandler$WaitLoop.waitForNotification(AttachHandler.java:406)
                at com/ibm/tools/attach/javaSE/AttachHandler$WaitLoop.run(AttachHandler.java:440)
If the above stacks are noticed, check the contents of /tmp/.com_ibm_tools_attach
        a) edtf '/tmp/.com_ibm_tools_attach*' and take option 6 for path information
image-20230504110317-1
The above screenshot shows very few objects.  Typically, if this directory is an issue, it will contain several thousand or over a million objects.
- If directory is several thousand up to a million+, delete the contents of /tmp/.com_ibm_tools_attach directory

2) Contention on a different IFS directory
https://www.ibm.com/docs/en/i/7.4?topic=ssw_ibm_i_74/apis/qp0fptos.html
 
    IFS development will require a QSYSPRT spool file created by the following:

     CALL QP0FPTOS PARM(*DUMP)
    
   Other options that may be of use:
        CALL PGM(QP0FPTOS) PARM(*DUMPLFS 'JobNumber')
        CALL PGM(QP0FPTOS) PARM(*DUMPALL 'JobNumber')
        
  If the job is a JVM, a   javacore files will show the JVM stacks.  This may not provide exactly what IFS path it is contending on, but the application classes may provide the owner of the application an understanding of where to look, IFS path, etc.
When a JVM is involved, a Job Watcher collection that includes both *CALLSTACK and *JAVASTACK events will provide java stack traces similar to that of a javacore.  The Job Watcher collection should be narrowed down to a specific job or set of jobs.

3) The collection services data can be quite helpful in determining the IFS activity.  Querying the QAPMJOBL file in collection services is a good start.  This will help you determine which job(s) are responsible for most of the IFS cache lookups.

Field             Text                                           
INTNUM            Interval number                                
JBNAME            Job name                                     
JBUSER            Job user                                       
JBNBR             Job number                                     
JBMLCH            File system lookup cache hits                  
JBMLCM            File system lookup cache misses                
JBMSLR            File system symbolic link reads                
JBMDYR            File system directory reads             
       

Example query:
SELECT  INTNUM, JBNAME,                                           
SUM(JBMLCH) as TOTCACHE                                           
FROM QPFRDATA/QAPMJOBL where INTNUM = '23' GROUP BY JBNAME, INTNUM
  ORDER BY SUM(JBMLCH) desc                                       
 
 
                                  TOTCACHE
 Interval  Job                             
 number    name                            
      23   ADMIN5                   24,402
      23   Q1ACPDST                 18,975
      23   QTSMTPCLTD                3,183
      23   QTSMTPSRVD                  792
      23   QTMSSMTPD                   652
      23   ADMIN4                      450
      23   QGLDPUBA                     90
      23   QUSRDIR                      78
      23   QNAVMNSRV                    41
      23   QSQSRVR                       4
      23   CRTPFRDTA                     2
      23   DD-FREEAGENT-001              0
      23   QPASUTIL                      0
      23   VIO-FC0005CQ  00              0
      23   SMPO0000                      0
      23   QDBSRV05                      0   
 
  SELECT   JBNAME, SUM(JBMLCH) as "Total Cache Hits"
FROM QPFRDATA/QAPMJOBL  GROUP BY JBNAME                         
  ORDER BY SUM(JBMLCH) desc                                     

                  "Total Cache Hits
Job                                
name                               
ADMIN5                   2,343,632
Q1ACPDST                 1,818,116
QTSMTPCLTD                 304,511
QTSMTPSRVD                  76,014
QTMSSMTPD                   64,985
ADMIN4                      43,080
QGLDPUBA                     9,357
QCLNSYSLOG                   9,202
QUSRDIR                      7,459
QNAVMNSRV                    4,029
QYMEPFRCVT                     576
QYMEARCPMA                     575
ADMIN                          212
IBMARE                         201
AMHSSL443                      198
CRTPFRDTA                      190
QSQSRVR                         56
4) Websphere reloadInterval issues.  This specific issue may show javastack including

  -This used to be more common, but we have not seen much of this in the last few years.  
From a Javacore perspective, we see this stack common among the JVMs:
  
at java/io/UnixFileSystem.getLastModifiedTime(Native Method)                                                  
   at java/io/File.lastModified(File.java:837(Compiled Code))                                                    
   at com/ibm/ws/classloader/ReloadableClassLoader.checkForUpdate(ReloadableClassLoader.java:241(Compiled Code))
   at com/ibm/ws/classloader/ClassLoaderManager.checkAndNotify(ClassLoaderManager.java:543(Compiled Code))       
   at com/ibm/ws/classloader/ClassLoaderManager.access$000(ClassLoaderManager.java:82(Compiled Code))            
   at com/ibm/ws/classloader/ClassLoaderManager$ReloadTimerTask.alarm(ClassLoaderManager.java:586(Compiled Code))
   at com/ibm/ejs/util/am/_Alarm.run(_Alarm.java:133(Compiled Code))                                             
   at com/ibm/ws/util/ThreadPool$Worker.run(ThreadPool.java:1604(Compiled Code))   
 

Document Location

Worldwide

[{"Type":"MASTER","Line of Business":{"code":"LOB57","label":"Power"},"Business Unit":{"code":"BU058","label":"IBM Infrastructure w\/TPS"},"Product":{"code":"SWG60","label":"IBM i"},"ARM Category":[{"code":"a8m0z0000000CGvAAM","label":"Integrated File System"}],"ARM Case Number":"","Platform":[{"code":"PF012","label":"IBM i"}],"Version":"All Versions"}]

Document Information

Modified date:
17 January 2024

UID

ibm16985987