Optimizing AIX 7 operating system performance, Part 3

Tune with ioo, filemon, fileplace, JFS and JFS2

Content series:

This content is part # of # in the series: Optimizing AIX 7 operating system performance, Part 3

Stay tuned for additional content in this series.

This content is part of the series:Optimizing AIX 7 operating system performance, Part 3

Stay tuned for additional content in this series.

About this series

This three-part series (see Related topics) on the AIX® disk and I/O subsystem focuses on the challenges of optimizing disk I/O performance. While disk tuning is arguably less exciting than CPU or memory tuning, it is a crucial component in optimizing server performance. In fact, partly because disk I/O is your weakest subsystem link, there is more you can do to improve disk I/O performance than on any other subsystem.

The first and second installments of this series discussed the importance of architecting your systems, the impact it can have on overall system performance, and a new I/O tuning tool, lvmo, which you can use to tune logical volumes. In this installment, you will examine how to tune your systems using the ioo command, which configures the majority of all I/O tuning parameters and displays the current or next boot values for all I/O tuning parameters. You will also learn how and when to use the filemon and fileplace tools. With enhanced journaled file system, the default file system within AIX, improving your overall file system performance, tuning your file systems, and getting the best out of the JFS2 are all important parts of your tuning toolkit. You'll even examine some file system attributes, such as sequential and random access, which can affect performance.

File system overview

This section discusses JFS2, file system performance, and specific performance improvements over JFS. As you know, there are two types of kernels in AIX. There is a 32-bit kernel and a 64-bit kernel. While they both share some common libraries and most commands and utilities, it is important to understand their differences and how the kernels relate to overall performance tuning. JFS2 has been optimized for the 64-bit kernel, while JFS is optimized for the 32-bit kernel. Journaling file systems, while much more secure, historically have been associated with performance overheads. In a Performance Rules shop (at the expense of availability), you would disable metadata logging to increase performance with JFS. With JFS2, you can also disable logging (in AIX 6.1 and higher) to help increase performance. You can disable logging at the point of mounting the filesystem, which means that you don't need to worry about changing or reconfiguring the filesystem. You can instead just modify your mount options. For example, to disable logging on filesystem you would use the following: mount -i log=NULL /database.

Although JFS2 was optimized to improve the performance of metadata operations, that is, those normally handled by the logging framework, switching logging off can have a significant performance benefit for filesystems where there is a high proportion of file changes and newly created/deleted files. For example, filesystems on development filesystems may see an increase in performance. For databases where the files used are static, the performance improvement may be less significant.

However, you should be careful making use of compression. Although compression can save disk space (and disk reads and writes, since less data is physically read from or written to the disk), the overhead on systems with a heavy CPU loads can actually slow performance down.

Enhanced JFS2 uses a binary tree representation while performing inode searches, which is a much better method than the linear method used by JFS. Furthermore, you do not need to assign inodes anymore when creating file systems, as they are now dynamically allocated by JFS2 (meaning you won't be running out of them).

While concurrent I/O was covered in the first installment of the series, it's worth another mention here. Implementation of concurrent I/O allows multiple threads to read and write data concurrently to the same file. This is due to the way in which JFS2 is implemented with write-exclusive inode locks. This allows multiple users to read the same file simultaneously, which increases performance dramatically when multiple users read from the same data file. To turn concurrent I/O on, you just need to mount the f/s with the appropriate flags (see Listing 1). We recommend that you look at using concurrent I/O when using databases such as Oracle.

Listing 1. Turning on concurrent I/O
 root@lpar29p682e_pub[/] mount -o cio /test root@lpar29p682e_pub[/] > df -k /test Filesystem 1024-blocks Free %Used Iused %Iused Mounted on /dev/fslv00 131072 130724 1% 4 1% /test

Table 1 illustrates the various enhancements of JFS2 and how they relate to systems performance. It's also important to understand that when tuning your I/O systems, many of the tunables themselves (you'll get into that later) differ, depending on whether you are using JFS or JFS2.

Table 1. Enhancements of JFS2
Deferred updateYesNo
Direct I/O supportYesYes
Max file system size1 terabyte4 petabytes
Max file size64 gigabyes4 petabytes
Number of inodesFixed when creating f/sDynamic
Large file supportAs mount optionDefault
On-line defragmentationYesYes

Filemon and fileplace

This section introduces two important I/O tools, filemon and fileplace, and discusses how you can use them during systems administration each day.

Filemon uses a trace facility to report on the I/O activity of physical and logical storage, including your actual files. The I/O activity monitored is based on the time interval that is specified when running the trace. It reports on all layers of file system utilization, including the Logical Volume Manager (LVM), virtual memory, and physical disk layers. Without any flags, it runs in the background while application programs or system commands are being run and monitored. The trace starts automatically until it is stopped. At that time, the command generates an I/O activity report and exits. It can also process a trace file that has been recorded by the trace facility. Reports can then be generated from this file. Because reports generated to standard output usually scroll past your screen, it's recommended that you use the -o option to write the output to a file (see Listing 2).

Listing 2. Using filemon with the -o option
 l488pp065_pub[/] > filemon -o dbmon.out -O all Run trcstop command to signal end of trace. Thu Aug 12 09:07:06 2010 System: AIX 7.1 Node: l488pp065_pub Machine: 00F604884C00 l488pp065_pub[/] > trcstop l488pp065_pub[/] > cat dbmon.out Thu Aug 12 09:10:09 2010 System: AIX 7.1 Node: l488pp065_pub Machine: 00F604884C00 Cpu utilization: 72.8% Cpu allocation: 100.0% 21947755 events were lost. Reported data may have inconsistencies or errors. Most Active Files ------------------------------------------------------------------------ #MBs #opns #rds #wrs file volume:inode ------------------------------------------------------------------------ 0.4 1 101 0 unix /dev/hd2:82241 0.0 9 10 0 vfs /dev/hd4:9641 0.0 4 6 1 db.sql 0.0 3 6 2 /dev/hd2:111192 0.0 1 2 0 /dev/hd2:110757 0.0 45 1 0 null 0.0 1 1 0 /dev/hd2:110827 0.0 9 2 0 SWservAt /dev/hd4:9156 0.0 1 0 3 db2.sql 0.0 9 2 0 /dev/hd4:9157 Most Active Segments ------------------------------------------------------------------------ #MBs #rpgs #wpgs segid segtype volume:inode ------------------------------------------------------------------------ 0.1 2 13 8359ba client Most Active Logical Volumes ------------------------------------------------------------------------ util #rblk #wblk KB/s volume description ------------------------------------------------------------------------ 0.04 0 32 0.3 /dev/hd9var /var 0.00 0 48 0.5 /dev/hd8 jfs2log 0.00 0 8 0.1 /dev/hd4 / Most Active Physical Volumes ------------------------------------------------------------------------ util #rblk #wblk KB/s volume description ------------------------------------------------------------------------ 0.00 0 72 0.7 /dev/hdisk0 N/A Most Active Files Process-Wise ------------------------------------------------------------------------ #MBs #opns #rds #wrs file PID(Process:TID) ------------------------------------------------------------------------ 0.0 3 6 0 db.sql 7667828(ksh:9437345) 0.0 1 2 0 7667828(ksh:9437345) 0.0 1 0 3 db2.sql 7667828(ksh:9437345) 0.0 1 0 1 db.sql 7733344(ksh:7405633) 0.4 1 101 0 unix 7667830(ksh:9437347) 0.0 1 2 0 7667830(ksh:9437347) 0.0 1 2 0 7667830(ksh:9437347) 0.0 9 2 0 SWservAt 7667830(ksh:9437347) 0.0 9 2 0 7667830(ksh:9437347) 0.0 1 0 0 systrctl 7667830(ksh:9437347) 0.0 44 0 44 null 4325546(slp_srvreg:8585241) 0.0 1 2 2 7667826(ksh:23527615) 0.0 1 1 0 7667826(ksh:23527615) 0.0 1 1 0 null 7667826(ksh:23527615) 0.0 1 0 0 test 7667826(ksh:23527615) 0.0 8 8 0 vfs 3473482(topasrec:13566119) 0.0 1 0 0 3473482(topasrec:13566119) 0.0 1 0 0 CuAt 3473482(topasrec:13566119) 0.0 1 2 0 vfs 2097252(syncd:2490503) 0.0 1 0 0 installable 4260046(java:15073489) Most Active Files Thread-Wise ------------------------------------------------------------------------ #MBs #opns #rds #wrs file TID(Process:PID) ------------------------------------------------------------------------ 0.0 3 6 0 db.sql 9437345(ksh:7667828) 0.0 1 2 0 9437345(ksh:7667828) 0.0 1 0 3 db2.sql 9437345(ksh:7667828) 0.0 1 0 1 db.sql 7405633(ksh:7733344) 0.4 1 101 0 unix 9437347(ksh:7667830) 0.0 1 2 0 9437347(ksh:7667830) 0.0 1 2 0 9437347(ksh:7667830) 0.0 9 2 0 SWservAt 9437347(ksh:7667830) 0.0 9 2 0 9437347(ksh:7667830) 0.0 1 0 0 systrctl 9437347(ksh:7667830) 0.0 44 0 44 null 8585241(slp_srvreg:4325546) 0.0 1 2 2 23527615(ksh:7667826) 0.0 1 1 0 23527615(ksh:7667826) 0.0 1 1 0 null 23527615(ksh:7667826) 0.0 1 0 0 test 23527615(ksh:7667826) 0.0 8 8 0 vfs 13566119(topasrec:3473482) 0.0 1 0 0 13566119(topasrec:3473482) 0.0 1 0 0 CuAt 13566119(topasrec:3473482) 0.0 1 2 0 vfs 2490503(syncd:2097252) 0.0 1 0 0 installable 15073489(java:4260046) dbmon.out: END

Look for long seek times, as they can result in decreased application performance. By looking at the read and write sequence counts in detail, you can further determine if the access is sequential or random. This helps you when it is time to do your I/O tuning. This output clearly illustrates that there is no I/O bottleneck visible. Filemon provides a tremendous amount of information and, truthfully, we've found there is too much information at times. Further, there can be a performance hit using filemon, depending on how much general file activity there is while filemon is running. Let's look at the topas results while running filemon (see Figure 1).

Figure 1. topas results while running filemon
topaz results while running filemon
topaz results while running filemon

In the figure above, filemon is taking up almost 60 percent of the CPU! This is actually less than in previous AIX versions but still a significant impact on your overall system performance. We don't typically like to recommend performance tools that have such a substantial overhead, so we'll reiterate that while filemon certainly has a purpose, you need to be very careful when using it.

What about fileplace? Fileplace reports the placement of a file's blocks within a file system. It is commonly used to examine and assess the efficiency of a file's placement on disk. For what purposes do you use it? One reason would be to help determine if some of the heavily utilized files are substantially fragmented. It can also help you determine the physical volume with the highest utilization and whether or not the drive or I/O adapter is causing the bottleneck.

Let's look at an example of a frequently accessed file in Listing 3.

Listing 3. Frequently accessed file
 fileplace -pv /tmp/logfile File: /tmp/logfile Size: 63801540 bytes Vol: /dev/hd3 Blk Size: 4096 Frag Size: 4096 Nfrags: 15604 Inode: 7 Mode: -rw-rw-rw- Owner: root Group: system Physical Addresses (mirror copy 1) Logical Extent ---------------------------------- ---------------- 02884352-02884511 hdisk0 160 frags 655360 Bytes, 1.0% 00000224-00000383 02884544-02899987 hdisk0 15444 frags 63258624 Bytes, 99.0% 00000416-00015859 unallocated -27 frags -110592 Bytes 0.0% 15604 frags over space of 15636 frags: space efficiency = 99.8% 2 extents out of 15604 possible: sequentiality = 100.0%

You should be interested in space efficiency and sequentiality here. Higher space efficiency means files are less fragmented and provide better sequential file access. A higher sequentiality tells you that the files are more contiguously allocated, which will also be better for sequential file access. In the case here, space efficiency could be better while sequentiality is quite high. If the space and sequentiality are too low, you might want to consider file system reorganization. You would do this with the reorgvg command, which can improve logical volume utilization and efficiency. You may also want to consider using the degrafs command which can help ensure that the free space on your filesystem is contiguous, which will help with future writes and file creates. Defragmentation can occur in the background while you are using your system.

Tuning with ioo

This section discusses the use of the ioo command, which is used for virtually all I/O-related tuning parameters.

Like vmo, you need to be extremely careful when changing ioo parameters, as changing parameters on the fly can cause severe performance degradation. Table 2 details specific tuning parameters that you use often for JFS file systems. As you can see, the majority of the tuning commands for I/O utilize the ioo utility.

Table 2. Specific tuning parameters
FunctionJFS tuning parameterEnhanced JFS tuning parameter
Sets max amount of memory for caching filesvmo -o maxperm=valuevmo -o maxclient=value (< or = maxperm)
Sets min amount of memory for cachingvmo -o minperm=valuen/a
Sets a limit (hard) on memory for cachingvmo -o strict_maxpermvmo -o maxclient (hard limit)
Sets max pages used for sequential read aheadioo -o maxpgahead=valueioo -o j2_maxPageReadAhead=value
Sets min pages used for sequential read aheadioo -o minpgaheadioo -o j2_minPageReadAhead=value
Sets max number of pending write I/O to a filechhdev -l sys0 -a maxpout maxpoutchdev -l sys0 -a maxpout maxpout
Sets min number of pending write I/Os to a file at which programs blocked by maxpout might proceedchdev -l sys0 -a minpout minpoutchdev -l sys0 -a minpout minpout
Sets the amount of modified data cache for a file with random writesioo -o maxrandwrt=valueioo -o j2_maxRandomWrite ioo -o j2_nRandomCluster
Controls gathering of I/Os for sequential write behindioo -o numclust=valueioo -o j2_nPagesPerWriteBehindCluster=value
Sets the number of f/s bufstructsioo -o numfsbufs=valueioo -o j2_nBufferPerPagerDevice=value

Let's further discuss some of the more important parameters below, as we've already discussed all the vmo tuning parameters in the memory tuning series (see Related topics).

There are several ways you can determine the existing ioo values on your system. The long display listing for ioo clearly gives you the most information (see Listing 4). It lists the values for current, reboot value, range, unit, type, and dependencies of all tunables parameters managed by ioo.

Listing 4. Display for ioo
 root@lpar29p682e_pub[/] > ioo -L NAME CUR DEF BOOT MIN MAX UNIT TYPE DEPENDENCIES j2_atimeUpdateSymlink 0 0 0 0 1 boolean D j2_dynamicBufferPreallo 16 16 16 0 256 16K slabs D j2_inodeCacheSize 400 400 400 1 1000 D j2_maxPageReadAhead 128 128 128 0 64K 4KB pages D j2_maxRandomWrite 0 0 0 0 64K 4KB pages D j2_maxUsableMaxTransfer 512 512 512 1 4K pages M j2_metadataCacheSize 400 400 400 1 1000 D j2_minPageReadAhead 2 2 2 0 64K 4KB pages D j2_nBufferPerPagerDevice 512 512 512 512 256K M j2_nPagesPerWriteBehindC 32 32 32 0 64K D j2_nRandomCluster 0 0 0 0 64K 16KB clusters D j2_nonFatalCrashesSystem 0 0 0 0 1 boolean D j2_syncModifiedMapped 1 1 1 0 1 boolean D j2_syncdLogSyncInterval 1 1 1 0 4K iterations D jfs_clread_enabled 0 0 0 0 1 boolean D jfs_use_read_lock 1 1 1 0 1 boolean D lvm_bufcnt 9 9 9 1 64 128KB/buffer D maxpgahead minpgahead 8 8 8 0 4K 4KB pages D maxrandwrt 0 0 0 0 512K 4KB pages D memory_frames 512K 512K 4KB pages S Minpgahead maxpgahead 2 2 2 0 4K 4KB pages D numclust 1 1 1 0 2G-1 16KB/cluster D numfsbufs 196 196 196 1 2G-1 M pd_npages 64K 64K 64K 1 512K 4KB pages D pgahd_scale_thresh 0 0 0 0 419430 4KB pages D pv_min_pbuf 512 512 512 512 2G-1 D sync_release_ilock 0 0 0 0 1 boolean D n/a means parameter not supported by the current platform or kernel Parameter types: S = Static: cannot be changed D = Dynamic: can be freely changed B = Bosboot: can only be changed using bosboot and reboot R = Reboot: can only be changed during reboot C = Connect: changes are only effective for future socket connections M = Mount: changes are only effective for future mountings I = Incremental: can only be incremented d = deprecated: deprecated and cannot be changed

Listing 5 below shows you how to change a tunable.

Listing 5. Changing a tunable
 root@lpar29p682e_pub[/] > ioo -o maxpgahead=32 Setting maxpgahead to 32 root@lpar29p682e_pub[/] >

This parameter is used for JFS only. For JSF2, there are additional file system performance enhancements including sequential page read ahead and sequential and random write behind. The Virtual Memory Manager (VMM) of AIX anticipates page requirements for observing the patterns of files that are accessed. When a program accesses two pages of a file, VMM assumes that the program keeps trying to access the file in a sequential method. The number of pages to be read ahead can be configured using VMM thresholds. With JFS2, make note of these two important parameters:

  • J2_minPageReadAhead: Determines the number of pages ahead when VMM initially detects a sequential pattern.
  • J2_maxPageReadAhead: Determines the maximum amount of pages that VMM can read in a sequential file.

Sequential and random write behind relates to writing modified pages in memory to disk after a certain threshold is reached. In this way, it does not wait for syncd to flush out pages to disk. The reason for this is to limit the amount of dirty pages in memory, which further reduces I/O overhead and disk fragmentation. The two types of write behind are sequential and random. With sequential write behind, pages do not stay in memory until the syncd daemon runs, which can cause real bottlenecks. With random write behind, the number of pages in memory exceeds a specified amount and all subsequent pages are written to disk.

For the sequential write behind, you should specify the number of pages to be scheduled to be written; the j2_nPagesPerWriterBehindCluster parameter specifies this parameter. By default the value is 32 (that is, 128KB), for modern disks and high write environments, such as databases, you may want to increase this parameter so that more data is written in a single block when the data needs to be synced to disk.

The random write behind can be configured by changing the values of j2_nRandomCluster and j2_maxRandomWrite. The j2_maxRandomWrite parameter specifies the number of pages of a file that are allowed to stay in memory. The default is 0 (meaning that information is written out as quickly as possible), and this is used to ensure data integrity. If you are willing to sacrifice some integrity in the event of a system failure, for better write performance you can increase these values. This keeps them in cache, so a system failure may not have written the data to disk properly. The j2_nRandomCluster defines the number of clusters apart two writes must be to be considered random. Increasing this value can lower the write frequency if you have a high number of files being modified at the same time.

Another important area worth mentioning is large sequential I/O processing. When there is too much simultaneous I/O to your file systems, the I/O can bottleneck at the f/s level. In this case, you should increase the j2_nBufferPerPagerDevice parameter (numfsbus with JFS). If you use raw I/O as opposed to file systems, this same type of bottleneck can occur through LVM. Here is where you might want to tune the lvm_bufcnt parameter.


This article focused on file system performance. You examined the enhancements in JFS2 and why it would be the preferred file system. Further, you used tools, such as filemon and fileplace, to gather more detailed information about the actual file structures and how they relate to I/O performance. Finally, you tuned your I/O subsystem by using the ioo command. You learned about the J2_minPageReadAhead and J2_maxPageReadAhead parameters in an effort to increase performance when encountering sequential I/O.

During this three-part series on I/O you learned that, perhaps more so than any other subsystem, your tuning must start prior to stress testing your systems. Architecting the systems properly can do more to increase performance than anything you can do with tuning I/O parameters. This includes strategic disk placement and making sure you have enough adapters to handle the throughput of your disks. Further, while this series focused on I/O, understand that the VMM is also very tightly linked with I/O performance and must also be tuned to receive optimum I/O performance.

Downloadable resources

Related topics

  • IBM Redbooks: Database Performance Tuning on AIX is designed to help system designers, system administrators, and database administrators design, size, implement, maintain, monitor, and tune a Relational Database Management System (RDMBS) for optimal performance on AIX.
  • "Processor affinity on AIX" (developerWorks, November 2006): Using process affinity settings to bind or unbind threads can help you find the root cause of troublesome hang or deadlock problems. Read this article to learn how to use processor affinity to restrict a process and run it only on a specified central processing unit (CPU).
  • The AIX 7.1 Information Center is your source for technical information about the AIX operating system.
  • The IBM AIX Version 6.1 Differences Guide can be a useful resource for understanding changes in AIX 6.1.
  • Operating System and Device Management: This document from IBM provides users and system administrators with complete information that can affect your selection of options when performing such tasks as backing up and restoring the system, managing physical and logical storage, and sizing appropriate paging space.
  • IBM Redbooks: For help in obtaining IBM certification for AIX 5L and the eServer® pSeries®, read IBM Certification Study Guide for eServer p5 and pSeries Administration and Support for AIX 5L Version 5.3.
  • AIX and UNIX: The AIX and UNIX developerWorks zone provides a wealth of information relating to all aspects of AIX systems administration and expanding your UNIX skills.
  • IBM trial software: Build your next development project with software for download directly from developerWorks.
  • Future Tech: Visit Future Tech's site to learn more about their latest offerings.
Zone=AIX and UNIX
ArticleTitle=Optimizing AIX 7 operating system performance, Part 3: Tune with ioo, filemon, fileplace, JFS and JFS2