Optimizing AIX 5L performance: Tuning disk performance, Part 3

Part 3 of this series covers how to improve overall file system performance, how to tune your systems with the ioo command, and how to use the filemon and fileplace utilities.

Ken Milberg, Future Tech UNIX Consultant, Technology Writer, and Site Expert, Future Tech

Ken Milberg is a Technology Writer and Site Expert for techtarget.com and provides Linux technical information and support at searchopensource.com. He is also a writer and technical editor for IBM Systems Magazine, Open Edition. Ken holds a bachelor's degree in computer and information science and a master's degree in technology management from the University of Maryland. He is the founder and group leader of the NY Metro POWER-AIX/Linux Users Group. Through the years, he has worked for both large and small organizations and has held diverse positions from CIO to Senior AIX Engineer. Today, he works for Future Tech, a Long Island-based IBM business partner. Ken is a PMI certified Project Management Professional (PMP), an IBM Certified Advanced Technical Expert (CATE, IBM System p5 2006), and a Solaris Certified Network Administrator (SCNA). You can contact him at kmilberg@gmail.com.



09 October 2007

Also available in Chinese Russian

About this series

This three-part series (see Resources) on the AIX® disk and I/O subsystem focuses on the challenges of optimizing disk I/O performance. While disk tuning is arguably less exciting than CPU or memory tuning, it is a crucial component in optimizing server performance. In fact, partly because disk I/O is your weakest subsystem link, there is more you can do to improve disk I/O performance than on any other subsystem.

Introduction

The first and second installments of this series discussed the importance of architecting your systems, the impact it can have on overall system performance, and a new I/O tuning tool, lvmo, which you can use to tune logical volumes. In this installment, examine how to tune your systems using the ioo command, which configures the majority of all I/O tuning parameters and displays the current or next boot values for all I/O tuning parameters. Also learn how and when to use the filemon and fileplace tools (these specific AIX tools should be an important part of your repertoire), how to improve your overall file system performance, how to tune your file systems, and how the enhanced journaled file system (JFS2) compares with journaled file system (JFS). You'll even examine some file system attributes, such as sequential and random access, which can affect performance.

File system overview

This section discusses JFS2, file system performance, and specific performance improvements over JFS. As you know, there are two types of kernels in AIX. There is a 32-bit kernel and a 64-bit kernel. While they both share some common libraries and most commands and utilities, it is important to understand their differences and how the kernels relate to overall performance tuning. JFS2 has been optimized for the 64-bit kernel, while JFS is optimized for the 32-bit kernel. Journaling file systems, while much more secure, historically have been associated with performance overheads. In a Performance Rules shop (at the expense of availability), you would disable metadata logging in an effort to increase performance with JFS. With JFS2, that is no longer possible, or even necessary, because it was tuned in order to handle metadata-intensive types of applications much more efficiently. More importantly, the key advantage of JFS2 lies in its ability to scale. With JFS, there is a limit of 64GB for a file. With JFS2, you can have a file supporting 16TB. Another important change is the directory organization. Enhanced JFS2 uses a binary tree representation while performing inode searches, which is a much better method than the linear method used by JFS. Further, you do not need to assign inodes anymore when creating file systems, as they are now dynamically allocated by JFS2, meaning you won't be running out of them. While concurrent I/O was covered in the first installment of the series (see Resources), it's worth another mention here. Implementation of concurrent I/O allows multiple threads to read and write data concurrently to the same file. This is due to the way in which JFS2 is implemented with write-exclusive inode locks. This allows multiple users to read the same file simultaneously, which increases performance dramatically when multiple users read from the same data file. To turn concurrent I/O on, you just need to mount the f/s with the appropriate flags (see Listing 1). I recommend that you look at using concurrent I/O when using databases such as Oracle.

Listing 1. Turning on concurrent I/O
root@lpar29p682e_pub[/] mount -o cio /test
root@lpar29p682e_pub[/] > df -k /test
Filesystem    1024-blocks      Free %Used    Iused %Iused Mounted on
/dev/fslv00        131072    130724    1%        4     1% /test

Table 1 illustrates the various enhancements of JFS2 and how they relate to systems performance. It's also important to understand that when tuning your I/O systems, many of the tunables themselves (you'll get into that later) differ, depending on whether you are using JFS or JFS2.

Table 1. Enhancements of JFS2
FunctionJFSJFS2
CompressionYesNo
QuotasYesYes
Deferred updateYesNo
Direct I/O supportYesYes
Optimization32-bit64-bit
Max file system size1 terabyte4 petabytes
Max file size64 gigabyes4 pedabytes
Number of inodesFixed when creating f/sDynamic
Large file supportAs mount optionDefault
On-line degragmentationYesYes
NamefsYesYes
DMAPINoYes

Filemon and fileplace

This section introduces two important I/O tools, filemon and fileplace, and discusses how you can use them during systems administration each day.

Filemon uses a trace facility to report on the I/O activity of physical and logical storage, including your actual files. The I/O activity monitored is based on the time interval that is specified when running the trace. It reports on all layers of file system utilization, including the Logical Volume Manager (LVM), virtual memory, and physical disk layers. Without any flags, it runs in the background while application programs or system commands are being run and monitored. The trace starts automatically until it is stopped. At that time, the command generates an I/O activity report and exits. It can also process a trace file that has been recorded by the trace facility. Reports can then be generated from this file. Because reports generated to standard output usually scroll past your screen, it's recommended that you use the -o option to write the output to a file (see Listing 2).

Listing 2. Using filemon with the -o option
root@lpar29p682e_pub[/] > filemon -o dbmon.out -O all

Run trcstop command to signal end of trace.
Sun Aug 19 17:47:34 2007
System: AIX 5.3 Node: lpar29p682e_pub Machine: 00CED82E4C00

root@lpar29p682e_pub[/] > trcstop
[filemon: Reporting started]
root@lpar29p682e_pub[/] > [filemon: Reporting completed]

[filemon: 73.906 secs in measured interval]

root@lpar29p682e_pub[/] >
When we look at our file, here is what we see:
Sun Aug 19 17:50:45 2007
System: AIX 5.3 Node: lpar29p682e_pub Machine: 00CED82E4C00
Cpu utilization:  68.2%
Cpu allocation:   77.1%
130582780 events were lost.  Reported data may have inconsistencies or errors.
Most Active Files
------------------------------------------------------------------------
  #MBs  #opns   #rds   #wrs  file                     volume:inode
------------------------------------------------------------------------
   0.3      1     70      0  unix                     /dev/hd2:38608
   0.0      9     11      0  vfs                      /dev/hd4:949
   0.0      2      4      0  ksh.cat                  /dev/hd2:58491
   Most Active Segments
------------------------------------------------------------------------
  #MBs  #rpgs  #wpgs  segid  segtype                  volume:inode
------------------------------------------------------------------------
   0.6      0    162   223b9  client
   
Most Active Logical Volumes
------------------------------------------------------------------------
  util  #rblk  #wblk   KB/s  volume                   description
------------------------------------------------------------------------
  0.25      0    120    0.2  /dev/hd8                 jfs2log
  0.00      0   1304    2.7  /dev/hd4                 /
  
------------------------------------------------------------------------
Detailed File Stats
------------------------------------------------------------------------

FILE: /unix  volume: /dev/hd2  inode: 38608
opens:                  1
total bytes xfrd:       286720
reads:                  70      (0 errs)
  read sizes (bytes):   avg  4096.0 min    4096 max    4096 sdev     0.0
  read times (msec):    avg   0.003 min   0.002 max   0.005 sdev   0.001
lseeks:                 130
------------------------------------------------------------------------
Detailed VM Segment Stats   (4096 byte pages)
------------------------------------------------------------------------

SEGMENT: 223b9  segtype: client
segment flags:          clnt
writes:                 162     (0 errs)
  write times (msec):   avg   1.317 min   0.369 max   1.488 sdev   0.219
  write sequences:      5
  write seq. lengths:   avg    32.4 min       1 max      64 sdev    20.8

------------------------------------------------------------------------
Detailed Logical Volume Stats   (512 byte blocks)
------------------------------------------------------------------------

VOLUME: /dev/hd8  description: jfs2log
writes:                 15      (0 errs)
  write sizes (blks):   avg     8.0 min       8 max       8 sdev     0.0
  write times (msec):   avg   0.389 min   0.287 max   1.277 sdev   0.250
  write sequences:      11
  write seq. lengths:   avg    10.9 min       8 max      24 sdev     5.1
seeks:                  11      (73.3%)

Detailed Physical Volume Stats   (512 byte blocks)
------------------------------------------------------------------------

VOLUME: /dev/hdisk0  description: Virtual SCSI Disk Drive
writes:                 33      (0 errs)
  write sizes (blks):   avg    45.3 min       8 max     256 sdev    82.7
  write times (msec):   avg   0.544 min   0.267 max   1.378 sdev   0.370
  write sequences:      26
  write seq. lengths:   avg    57.5 min       8 max     512 sdev   122.6
seeks:                  26      (78.8%)
  seek dist (blks):     init 17091584,
                        avg 913560.3 min       8 max 3940256 sdev 1431025.7
  seek dist (%tot blks):init 40.74951,
                        avg 2.17810 min 0.00002 max 9.39430 sdev 3.41183
time to next req(msec): avg 6369.624 min   0.051 max 120046.794 sdev 23589.450
throughput:             3.1 KB/sec
utilization:            0.00

Look for long seek times, as they can result in decreased application performance. By looking at the read and write sequence counts in detail, you can further determine if the access is sequential or random. This helps you when it is time to do your I/O tuning. This output clearly illustrates that there is no I/O bottleneck to speak of. Filemon provides a tremendous amount of information and, truthfully, I've found there is too much information at times. Further, there can be a large performance hit using filemon. Let's look at the topaz results while running filemon (see Figure 1).

Figure 1. topaz results while running filemon
topaz results while running filemon

In this example, filemon is taking up almost 96 percent of the CPU! I don't typically like to recommend performance tools that have such a substantial overhead, so I'll reiterate that while filemon certainly has a purpose, you need to be very careful when using it.

What about fileplace? Fileplace reports the placement of a file's blocks within a file system. It is commonly used to examine and assess the efficiency of a file's placement on disk. For what purposes do you use it? One reason would be to help determine if some of the heavily utilized files are substantially fragmented. It can also help you determine the physical volume with the highest utilization and whether or not the drive or I/O adapter is causing the bottleneck.

Let's look at an example of a frequently accessed file in Listing 3.

Listing 3. Frequently accessed file
root@lpar29p682e_pub[/] > fileplace -pv dbfile

File: dbfile  Size: 5374622 bytes  Vol: /dev/hd4
Blk Size: 4096  Frag Size: 4096  Nfrags: 1313
Inode: 21  Mode: -rw-r--r--  Owner: root  Group: system

  Physical Addresses (mirror copy 1)                                 Logical Extent
  ----------------------------------                                 ----------------
  02134816-02134943  hdisk0      128 frags    524288 Bytes,   9.7%    00004352-00004479
  02135680-02136864  hdisk0      1185 frags   4853760 Bytes,  90.3%   00005216-00006400

  1313 frags over space of 2049 frags:   space efficiency = 64.1%
  2 extents out of 1313 possible:   sequentiality = 99.9%

You should be interested in space efficiency and sequentiality here. Higher space efficiency means files are less fragmented and provide better sequential file access. A higher sequentiality tells you that the files are more contiguously allocated, which will also be better for sequential file access. In the case here, space efficiency could be better while sequentiality is quite high. If the space and sequentiality are too low, you might want to consider file system reorganization. You would do this with the reorgvg command, which can improve logical volume utilization and efficiency.

Tuning with ioo

This section discusses the use of the ioo command, which is used for virtually all I/O-related tuning parameters.

Like vmo, you need to be extremely careful when changing ioo parameters, as changing parameters on the fly can cause severe performance degradation. Table 2 details specific tuning parameters that you use often for JFS file systems. As you can clearly see, the majority of the tuning commands for I/O utilize the ioo utility.

Table 2. Specific tuning parameters
FunctionJFS tuning parameterEnhanced JFS tuning parameter
Sets max amount of memory for caching filesvmo -o maxperm=valuevmo -o maxclient=value (< or = maxperm)
Sets min amount of memory for cachingvmo -o minperm=valuen/a
Sets a limit (hard) on memory for cachingvmo -o strict_maxpermvmo -o maxclient (hard limit)
Sets max pages used for sequential read aheadioo -o maxpgahead=valueioo -o j2_maxPageReadAhead=value
Sets min pages used for sequential read aheadioo -o minpgaheadioo -o j2_minPageReadAhead=value
Sets max number of pending write I/O to a filechhdev -l sys0 -a maxpout maxpoutchdev -l sys0 -a maxpout maxpout
Sets min number of pending write I/Os to a file at which programs blocked by maxpout might proceedchdev -l sys0 -a minpout minpoutchdev -l sys0 -a minpout minpout
Sets the amount of modified data cache for a file with random writesioo -o maxrandwrt=valueioo -o j2_maxRandomWrite ioo -o j2_nRandomCluster
Controls gathering of I/Os for sequential write behindioo -o numclust=valueioo -o j2_nPagesPerWriteBehindCluster=value
Sets the number of f/s bufstructsioo -o numfsbufs=valueioo -o j2_nBufferPerPagerDevice=value

Let's further discuss some of the more important parameters below, as I've already discussed all the vmo tuning parameters in the memory tuning series (see Resources).

There are several ways you can determine the existing ioo values on your system. The long display listing for ioo clearly gives you the most information (see Listing 4). It lists the values for current, reboot value, range, unit, type, and dependencies of all tunables parameters managed by ioo.

Listing 4. Display for ioo
root@lpar29p682e_pub[/] > ioo -L
NAME                      CUR    DEF    BOOT   MIN    MAX    UNIT           TYPE
     DEPENDENCIES

j2_atimeUpdateSymlink     0      0      0      0      1      boolean           D
j2_dynamicBufferPreallo   16     16     16     0      256    16K slabs         D
j2_inodeCacheSize         400    400    400    1      1000                     D
j2_maxPageReadAhead       128    128    128    0      64K    4KB pages         D
j2_maxRandomWrite         0      0      0      0      64K    4KB pages         D
j2_maxUsableMaxTransfer   512    512    512    1      4K     pages             M
j2_metadataCacheSize      400    400    400    1      1000                     D
j2_minPageReadAhead       2      2      2      0      64K    4KB pages         D
j2_nBufferPerPagerDevice  512    512    512    512    256K                     M
j2_nPagesPerWriteBehindC  32     32     32     0      64K                      D
j2_nRandomCluster         0      0      0      0      64K    16KB clusters     D
j2_nonFatalCrashesSystem  0      0      0      0      1      boolean           D
j2_syncModifiedMapped     1      1      1      0      1      boolean           D
j2_syncdLogSyncInterval   1      1      1      0      4K     iterations        D
jfs_clread_enabled        0      0      0      0      1      boolean           D
jfs_use_read_lock         1      1      1      0      1      boolean           D
lvm_bufcnt                9      9      9      1      64     128KB/buffer      D
maxpgahead minpgahead     8      8      8      0      4K     4KB pages         D
maxrandwrt                0      0      0      0      512K   4KB pages         D
memory_frames             512K          512K                 4KB pages         S
Minpgahead maxpgahead     2      2      2      0      4K     4KB pages         D
numclust                  1      1      1      0      2G-1   16KB/cluster      D
numfsbufs                 196    196    196    1      2G-1                     M
pd_npages                 64K    64K    64K    1      512K   4KB pages         D
pgahd_scale_thresh        0      0      0      0      419430 4KB pages         D
pv_min_pbuf               512    512    512    512    2G-1                     D
sync_release_ilock        0      0      0      0      1      boolean           D

n/a means parameter not supported by the current platform or kernel

Parameter types:
    S = Static: cannot be changed
    D = Dynamic: can be freely changed
    B = Bosboot: can only be changed using bosboot and reboot
    R = Reboot: can only be changed during reboot
    C = Connect: changes are only effective for future socket connections
    M = Mount: changes are only effective for future mountings
    I = Incremental: can only be incremented
    d = deprecated: deprecated and cannot be changed

Listing 5 shows you how to change a tunable.

Listing 5. Changing a tunable
root@lpar29p682e_pub[/] > ioo -o maxpgahead=32
Setting maxpgahead to 32
root@lpar29p682e_pub[/] >

This parameter is used for JFS only. The following section applies to JFS2.

Some important JFS2-specific file system performance enhancements include sequential page read ahead and sequential and random write behind. The Virtual Memory Manager (VMM) of AIX anticipates page requirements for observing the patterns of files that are accessed. When a program accesses two pages of a file, VMM assumes that the program keeps trying to access the file in a sequential method. The number of pages to be read ahead can be configured using VMM thresholds. With JFS2, make note of these two important parameters:

  • J2_minPageReadAhead: This determines the number of pages ahead when VMM initially detects a sequential pattern.
  • J2_maxPageReadAhead: This determines the maximum amount of pages that VMM can read in a sequential file.

Sequential and random write behind relates to writing modified pages in memory to disk after a certain threshold is reached. In this way, it does not wait for syncd to flush out pages to disk. The reason for this is to limit the amount of dirty pages in memory, which further reduces I/O overhead and disk fragmentation. The two types of write behind are sequential and random. With sequential write behind, pages do not stay in memory until the syncd daemon runs, which can cause real bottlenecks. With random write behind, the number of pages in memory exceeds a specified amount and all subsequent pages are written to disk. Another important area worth mentioning is large sequential I/O processing. When there is too much simultaneous I/O to your file systems, the I/O can bottleneck at the f/s level. In this case, you should increase the j2_nBufferPerPagerDevice parameter (numfsbus with JFS). If you use raw I/O as opposed to file systems, this same type of bottleneck can occur through LVM. Here is where you might want to tune the lvm_bufcnt parameter.

Summary

This article focused on file system performance. You examined the enhancements in JFS2 and why it would be the preferred file system. Further, you used tools, such as filemon and fileplace, to gather more detailed information about the actual file structures and how they relate to I/O performance. Finally, you tuned your I/O subsystem by using the ioo command. You learned about the J2_minPageReadAhead and J2_maxPageReadAhead parameters in an effort to increase performance when encountering sequential I/O.

During this three-part series on I/O you learned that, perhaps more so than any other subsystem, your tuning must start prior to stress testing your systems. Architecting the systems properly can do more to increase performance than anything you can do with tuning I/O parameters. This includes strategic disk placement and making sure you have enough adapters to handle the throughput of your disks. Further, while this series focused on I/O, understand that the VMM is also very tightly linked with I/O performance and must also be tuned to receive optimum I/O performance.

Resources

Learn

Get products and technologies

  • You can download the nmon analyzer from here.
  • IBM trial software: Build your next development project with software for download directly from developerWorks.

Discuss

Comments

developerWorks: Sign in

Required fields are indicated with an asterisk (*).


Need an IBM ID?
Forgot your IBM ID?


Forgot your password?
Change your password

By clicking Submit, you agree to the developerWorks terms of use.

 


The first time you sign into developerWorks, a profile is created for you. Information in your profile (your name, country/region, and company name) is displayed to the public and will accompany any content you post, unless you opt to hide your company name. You may update your IBM account at any time.

All information submitted is secure.

Choose your display name



The first time you sign in to developerWorks, a profile is created for you, so you need to choose a display name. Your display name accompanies the content you post on developerWorks.

Please choose a display name between 3-31 characters. Your display name must be unique in the developerWorks community and should not be your email address for privacy reasons.

Required fields are indicated with an asterisk (*).

(Must be between 3 – 31 characters.)

By clicking Submit, you agree to the developerWorks terms of use.

 


All information submitted is secure.

Dig deeper into AIX and Unix on developerWorks


static.content.url=http://www.ibm.com/developerworks/js/artrating/
SITE_ID=1
Zone=AIX and UNIX
ArticleID=260213
ArticleTitle=Optimizing AIX 5L performance: Tuning disk performance, Part 3
publish-date=10092007