z/OS Distributed File Service zFS Administration
Previous topic | Next topic | Contents | Contact z/OS | Library | PDF


LFS

z/OS Distributed File Service zFS Administration
SC23-6887-00

The LFS report provides detailed file system statistics; the following sample shows an example of the content. Each part of the report is described in the following sample LFS report.

 F ZFS,QUERY,LFS
 IOEZ00438I Starting Query Command LFS. 421
                      zFS Vnode Op Counts
  
 Vnode Op               Count    Vnode Op               Count
 ----------------- ----------    ----------------- ----------
 efs_hold                   0    efs_readdir            67997
 efs_rele                   0    efs_create           1569039
 efs_inactive               0    efs_remove           1945874
 efsvn_getattr        9856523    efs_rename            235320
 efs_setattr               40    efs_mkdir             237359
 efs_access           1656502    efs_rmdir             238004
 efs_lookup          21545682    efs_link              237318
 efs_getvolume              0    efs_symlink           237318
 efs_getlength              0    efs_readlink               0
 efs_afsfid                 0    efs_rdwr                   0
 efs_fid                    0    efs_fsync                  0
 efs_vmread                 0    efs_waitIO                 9
 efs_vmwrite                0    efs_cancelIO               0
 efs_clrsetid               0    efs_audit               5425
 efs_getanode           16640    efs_vmblkinfo              0
 efs_readdir_raw            0    efs_convert                0
  
 Average number of names per convert                        0
 Number of version5 directory splits                      126
 Number of version5 directory merges                       63
 
 Total zFS Vnode Ops                                 37849050
  
                           zFS Vnode Cache Statistics
  
 Vnodes      Requests     Hits    Ratio  Allocates   Deletes
 ---------- ---------- ---------- ----- ---------- ----------
     200000   25908218   22431383  86.580%          0          1
  
 zFS Vnode structure size: 224 bytes
 zFS extended vnodes: 200000, extension size 708 bytes (minimum)
 Held zFS vnodes:       2914 (high      29002) Open zFS vnodes:
 0 (high         10) Reusable: 197085
  
 Total osi_getvnode Calls:    3886774 (high resp          0) Avg. Call
 Time:         0.069 (msecs)
 Total SAF Calls:            11050540 (high resp          1) Avg. Call
 Time:         0.008 (msecs)
  
                           zFS Fast Lookup Statistics
  
 Buffers     Lookups      Hits    Ratio  Neg. Hits   Updates
 ---------- ---------- ---------- ----- ---------- ----------
       1000          0          0   0.0%          0          0
  
                          Metadata Caching Statistics
  
 Buffers   (K bytes)  Requests     Hits    Ratio   Updates   PartialWrt
 --------- --------- ---------- ---------- ------ ---------- ----------
     32768    262144   77813570   77529130  99.6%   27943073     423524
  
  
Metadata Backing Caching Statistics    

Buffers    (K bytes)  Requests     Hits    Ratio  Discards  
---------- --------- ---------- ---------- ----- ---------- 
    131072   1048576      24303        377   1.5%          0

                         Transaction Cache Statistics
  
 Transactions started:    7152165    Lookups on tran:    8032713    EC  Merges:     516363
 Allocated Transactions:       8034  (Act=         0, Pend=         0,  Comp=      4846, Free=      3188)
  
  
                   I/O Summary By Type
                   -------------------
  
 Count       Waits       Cancels     Merges      Type
 ----------  ----------  ----------  ----------  ----------
      33006        7701           0           0  File System Metadata
     680516        1020           0       56366  Log File
         11           1           0           0  User File Data
  
                   I/O Summary By Circumstance
                   ---------------------------
  
 Count       Waits       Cancels     Merges      Circumstance
 ----------  ----------  ----------  ----------  ------------
       7213        6553           0           0  Metadata cache read
          1           1           0           0  User file cache direct read
          4           4           0           0  Log file read
          0           0           0           0  Metadata cache async delete write
          0           0           0           0  Metadata cache async write
          0           0           0           0  Metadata cache lazy write
          0           0           0           0  Metadata cache sync delete write
          0           0           0           0  Metadata cache sync write
         10           0           0           0  User File cache direct write
          1           1           0           0  Metadata cache file sync write
      16981         861           0           0  Metadata cache sync daemon write
          0           0           0           0  Metadata cache aggregate detach write
          0           0           0           0  Metadata cache buffer block reclaim write
          0           0           0           0  Metadata cache buffer allocation write
          0           0           0           0  Metadata cache file system quiesce write
       8811         286           0           0  Metadata cache log file full write
     680512        1016           0       56366  Log file write
          0           0           0           0  Metadata cache shutdown write
          0           0           0           0  Format, grow write
  
                      zFS I/O by Currently Attached Aggregate
  
 DASD   PAV
 VOLSER IOs Mode  Reads       K bytes     Writes      K bytes
 Dataset Name
 ------ --- ----  ----------  ----------  ----------  ----------
 ------------
 ZFSD18   1  R/W          44         344        1831       17224
 ZFSAGGR.BIGZFS.DHH.FS14.EXTATTR
 ZFS121   1  R/W        6509       52056      648750    10276788
 ZFSAGGR.BIGZFS.DHH.FS1.EXTATTR
 ------           ----------  ----------  ----------  ----------
      2                 6553       52400      650581    10294012
 *TOTALS*
  
  
 Total number of waits for I/O:       8722
 Average I/O wait time:               115.334 (msecs)
 IOEZ00025I zFS kernel: MODIFY command - QUERY,LFS completed successfully
Table 1. LFS report sections
Field name Contents
zFS Vnode Op Counts: Shows the number of calls to the lower layer zFS components. One request from z/OS® UNIX typically requires more than one lower-layer call. Note that the output of this report wraps.
zFS Vnode Cache Statistics: Shows the zFS vnode cache statistics. It shows the number of currently allocated vnodes and the vnode hit ratio. Allocates and "Deletes" show requests to create new vnodes (for operations such as create or mkdir) and delete vnodes (for operations such as remove or failed creates or mkdirs). The size of this cache is controlled by the vnode_cache_size parameter and the demand for zFS vnodes placed by z/OS UNIX. In general, zFS tries to honor the setting of the vnode_cache_size parameter and recycle vnode structures to represent different files.

However, if z/OS UNIX requests more vnodes than zFS has allocated then zFS must allocate vnodes to avoid applications failing. Held zFS vnodes is the number of vnodes that z/OS UNIX has required of zFS to currently access. high is the largest number of vnodes that z/OS UNIX required of zFS to access at one time (during a peak time). z/OS UNIX also determines when files are to be opened and closed. Open zFS vnodes is the number of vnodes that represent currently open files. high is the largest number of files open at the same time. Generally, a good hit ratio for this cache is preferable because a miss means initializing the data structures and initialization requires a read of the object's status from disk. Often this is in the metadata cache, but it is not guaranteed. Consequently a vnode cache lookup miss might require an I/O wait.

The vnode structure size is shown; however, additional data structures anchored from the vnode also take space. Everything added together yields over 1 K of storage per vnode. Consider this when planning the size of this cache. Also note that initializing a vnode will not require an I/O if the object's status information is in the metadata cache, thus a good size metadata cache can be as useful—often more useful than an extremely large vnode cache.

Total osi_getvnode Calls is the number of times zFS called the osi_getvnode interface of z/OS UNIX to get a z/OS UNIX vnode to correspond to a new zFS vnode. Its high resp is the number of calls that took longer than a second to complete. Avg. Call Time is the average number of milliseconds each call took to complete.

Total SAF Calls is the number of calls zFS made to the security product via the SAF interface. high resp is the number of these security calls that took longer than a second to complete. Avg. Call Time is the average number of milliseconds each call took to complete.

zFS Fast Lookup Statistics: Shows the basic performance characteristics of the zFS fast lookup cache. The fast lookup cache is used on the owning system for a zFS sysplex-aware file system to improve the performance of the lookup operation. There are no externals for this cache (other than this display). The statistics show the total number of buffers (each are 8K in size), the total number of lookups, the cache hits for lookups and the hit ratio. The higher the hit ratio, the better the performance.
Metadata Caching Statistics: Shows the basic performance characteristics of the metadata cache. The metadata cache contains a cache of all disk blocks that contain metadata and any file data for files less than 7 K in size. For files smaller than 7 K, zFS places multiple files in one disk block (for zFS a disk block is 8 K bytes). Only the lower metadata management layers have the block fragmentation information, so the user file I/O for small files is performed directly through this cache rather than the user file cache.

The statistics show the total number of buffers (each are 8 K in size), the total bytes, the request rates, hit ratio of the cache, Updates (the number of times an update was made to a metadata block), and Partial writes (the number of times that only half of an 8-K metadata block needed to be written). The higher the hit ratio the better the performance. Metadata is accessed frequently in zFS and all metadata is contained only (for the most part) in the metadata cache therefore, a hit ratio of 80% or more is typically sufficient.

Metadata Backing Cache Statistics: Describes the performance of the extension to the metadata cache. The size of this extension is controlled by the metaback_cache_size configuration option. The backing cache is stored in a dataspace and is used only to avoid metadata reads from disk. All metadata updates and write I/O are performed from the primary metadata cache. Similar statistics to the metadata cache are shown for this cache. Every hit in this cache avoids one disk read, but the metadata backing cache is not needed except for workloads with many small user files or that are constrained in the zFS primary address space (possibly because of a large demand of zFS vnodes made by z/OS UNIX and its applications). Thus, if the zFS address space has primary space available, the space should be given to the primary metadata cache. In the preceding report example, the metadata backing cache is providing little performance benefit (as shown by its small hit ratio). It can only be created by specifying themetaback_cache_size configuration option of the IOEFSPRM file or the zfsadm config command.
Transaction Cache Statistics: zFS updates metadata on disk by writing the changes to the metadata to a log file. Each operation will create one or more transactions, write the updates to the logs associated with the transaction and then end the transaction. Each transaction has an associated state, which is described as follows:
Active
Records are still being written to the log file to describe the updates being made by this transaction; hence, the transaction was started but has not yet ended. Shown as "Act" in the report.
Complete
The transaction has ended, all updates were written to the log file, and the end transaction record is also written to the log for that transaction. Shown as Comp in the report.
Committed
The transaction has ended and all updates are written to the log file and all the log file pages that contain information about this transaction are on disk. At this point, the transaction is guaranteed. The update is present if the system stopped. (In the sample report, statistics for this count is not shown. As soon as a transaction is committed, the structure representing the transaction is "free" for reuse for another transaction.)
Equivalence Classes
zFS does not use a common technique called two-phase locking or commit. Rather, transactions that are related are grouped into equivalence classes. zFS will decide when a transaction is related to, or dependent on, another transaction. When this determination is made, the transactions are grouped into an equivalence class. Any transactions in the same equivalence class are committed together, or backed out together, in the event of a system failure. By using equivalence classes, threads running transactions run in parallel without added serialization between the two (other than locks if they hit common structures) and add their associated transactions to the same class. This increases throughput. The merge of equivalence classes occurs when two transactions that need to be made equivalent are both already in equivalence classes. In this case, both classes are merged "EC Merges".
Pending
A transaction is pending when all its updates are written to the log file, but other transactions in its same equivalence class have not ended. Shown as "Pend".

The transaction cache size is by default 2000 transactions. It can be changed by the tran_cache_size configuration option. In general, zFS increases the size of the cache if it determines too many I/O waits are occurring to sync log file pages to commit transactions so that their structure can be freed and thereby improve performance. Also, if you are using the zfsadm config command to set the tran_cache_size, the transaction cache will not be shrunk too small as to cause excessive log file syncs and you will see a failure if you attempt to set the cache too small. As a rule of thumb, the default should be fine for most customers. If zFS determines more are needed for performance, it will allocate more. zFS is conservative about adding more transaction structures. You might get a small performance boost by starting with a larger transaction cache size, so that zFS does not need to make checks to determine if it can increase the size or sync log file pages.

zFS I/O by Currently Attached Aggregate: The zFS I/O driver is essentially an I/O queue manager (one I/O queue per DASD). It uses Media Manager to issue I/O to VSAM data sets. It generally sends no more than one I/O per DASD volume to disk at one time. The exception is parallel access volume (PAV) DASD. These DASD often have multiple paths and can perform multiple I/O in parallel. In this case, zFS will divide the number of access paths by two and round any fraction up. (For example, for a PAV DASD with five paths, zFS will issue, at the most, three I/Os at one time to Media Manager).

zFS limits the I/O because it uses a dynamic reordering and prioritization scheme to improve performance by reordering the I/O queue on demand. Thus, high priority I/Os (I/Os that are currently being waited on, for example) are placed up front. An I/O can be made high priority at any time during its life. This reordering has been proven to provide the best performance, and for PAV DASD, performance tests have shown that not sending quite as many I/Os as available paths allows zFS to reorder I/Os and leave paths available for I/Os that become high priority.

Another feature of the zFS I/O driver is that by queueing I/Os, it allows I/Os to be canceled. For example, this is done in cases where a file was written, and then immediately deleted. Finally, the zFS I/O driver merges adjacent I/Os into one larger I/O to reduce I/O scheduling resource, this is often done with log file I/Os because often times multiple log file I/Os are in the queue at one time and the log file blocks are contiguous on disk. This allows log file pages to be written aggressively (making it less likely that users lose data in a failure) and yet batched together for performance if the disk has a high load.

This section contains the following information:
  • PAV IO, which shows how many I/Os are sent in parallel to Media Manager by zFS, non PAV DASD always shows the value 1.
  • DASD VOLSER for the primary extent of each aggregate and the total number of I/Os and bytes read/written.
  • Number of times a thread processing a request must wait on I/O and the average wait time in milliseconds is shown

By using this information with the KN report, you can break down zFS response time into what percentage of the response time is for I/O wait. To reduce I/O waits, you can run with larger cache sizes. Small log files (small aggregates) that are heavily updated might result in I/Os to sync metadata to reclaim log file pages resulting in additional I/O waits. Note that this number is not DASD response time. It is affected by it, but it is not the same. If a thread does not have to wait for an I/O then it has no I/O wait; if a thread has to wait for an I/O but there are other I/Os being processed, it might actually wait for more than one I/O (the time in queue plus the time for the I/O).

This report, along with RMF™ DASD reports and the zFS FILE report, can be also used to balance zFS aggregates among DASD volumes to ensure an even I/O spread.

 zFS Large Fast Lookup Statistics            
        --------------------------------            
Number of Large FLC Buffers allocated             60
Number of Large FLC Buffers stolen                18
Number of Large FLC Buffers assigned             326
Number of Large FLC Buffers requests         5989235
Number of Large FLC Buffers in use                 0
Number of Large FLC Hash Table Slots            4096
Number of Pieces in Large FLC                  32512

If you define the IOEPRMxx configuration option flc, the LFS query report will include statistics for the new Large FLC buffers. It contains information that includes the total number of Large FLC buffers allocated, the number of times a buffer is stolen from a directory, the number of times a buffer is assigned (and populated by reading in the entire contents of the directory) to a directory, the total number of operations (requests such as lookup, create or remove) performed on a directory that has a large buffer assigned to it, the size of the hash table in the buffer and how many name pieces each buffer contains.

The number of buffers that are allocated is the number of large buffers requested on the IOEFSPRM variable flc bufnum setting.

The number of buffers stolen is the number of times a buffer is stolen from a large directory. Stealing it can occur if z/OS UNIX inactivates the directory, zFS steals the vnode extension from the directory vnode, the containing file system is unmounted, the large directory itself is deleted or if the large directory has not been accessed in the number of seconds specified by the IOEPRMxx variable flc inactive setting. If this number seems high, it might be because the directory is not being accessed frequently. If it is being accessed frequently, then try increasing the vnode cache size or increasing the IOEPRMxx variable flc inactive value.

Number of buffers assigned is the number of times a large directory that does not currently have a Large FLC buffer assigned to it and has a size greater than or equal to IOEPRMxx variable flc mindirsize is accessed. This causes a Large FLC buffer to be populated by reading the contents of the entire directory from disk. If this number seems high, it might be because the directory is not being accessed frequently. If it is being accessed frequently, then try increasing the vnode cache size or increasing the IOEPRMxx variable flc inactive value. Another possible cause is that the IOEPRMxx variable flc mindirsize value is too small. This could occur if, over time, directories have grown in size and now more are considered to be large. If this has happened, then increase the IOEPRMxx variable flc mindirsize value to a more appropriate value or increase the IOEPRMxx variable flc bufnum to allocate more Large FLC buffers. If this number seems low, you might need to decrease the IOEPRMxx variable flc inactive to allow for more frequent Large FLC buffer reuse.

The number of buffer requests is the number of lookup, create, or remove directory operations done to a directory with a Large FLC buffer. This number can be used as an indicator of how frequently the large directories are being accessed.

The number of buffers in use is the number of Large FLC buffers that are currently assigned to a large directory. If you repeatedly find that this number is much lower than, or nearly the same as, the number of allocated buffers then you should verify that your IOEPRMxx variable flc bufnum setting is correct. Directories that used to exist might have been deleted or other (possibly new) directories might have grown to IOEPRMxx variable flc mindirsize and are now considered large.

Also included is the number of slots in the hash table and the number of pieces in the Large FLC buffers. Message IOEZ00821E indicates that a directory contains too many entries to have a Large FLC buffer assigned to it. Message IOEZ00819E indicates that a directory had a Large FLC buffer assigned to it, but a new entry was added to the directory that caused it to become too big. These messages indicate that the hash table and number of pieces are not sufficient. The IOEPRMxx variable flc bufnum value needs to be increased. This could have occurred because over time, the directories grew larger than previously expected. It is very important to reevaluate the IOEPRMxx variable flc settings periodically.

Go to the previous page Go to the next page




Copyright IBM Corporation 1990, 2014