|
The LFS report provides detailed file system statistics; the following
sample shows an example of the content. Each part of the report is
described in the following sample LFS report.
F ZFS,QUERY,LFS
IOEZ00438I Starting Query Command LFS. 421
zFS Vnode Op Counts
Vnode Op Count Vnode Op Count
----------------- ---------- ----------------- ----------
efs_hold 0 efs_readdir 67997
efs_rele 0 efs_create 1569039
efs_inactive 0 efs_remove 1945874
efsvn_getattr 9856523 efs_rename 235320
efs_setattr 40 efs_mkdir 237359
efs_access 1656502 efs_rmdir 238004
efs_lookup 21545682 efs_link 237318
efs_getvolume 0 efs_symlink 237318
efs_getlength 0 efs_readlink 0
efs_afsfid 0 efs_rdwr 0
efs_fid 0 efs_fsync 0
efs_vmread 0 efs_waitIO 9
efs_vmwrite 0 efs_cancelIO 0
efs_clrsetid 0 efs_audit 5425
efs_getanode 16640 efs_vmblkinfo 0
efs_readdir_raw 0 efs_convert 0
Average number of names per convert 0
Number of version5 directory splits 126
Number of version5 directory merges 63
Total zFS Vnode Ops 37849050
zFS Vnode Cache Statistics
Vnodes Requests Hits Ratio Allocates Deletes
---------- ---------- ---------- ----- ---------- ----------
200000 25908218 22431383 86.580% 0 1
zFS Vnode structure size: 224 bytes
zFS extended vnodes: 200000, extension size 708 bytes (minimum)
Held zFS vnodes: 2914 (high 29002) Open zFS vnodes:
0 (high 10) Reusable: 197085
Total osi_getvnode Calls: 3886774 (high resp 0) Avg. Call
Time: 0.069 (msecs)
Total SAF Calls: 11050540 (high resp 1) Avg. Call
Time: 0.008 (msecs)
zFS Fast Lookup Statistics
Buffers Lookups Hits Ratio Neg. Hits Updates
---------- ---------- ---------- ----- ---------- ----------
1000 0 0 0.0% 0 0
Metadata Caching Statistics
Buffers (K bytes) Requests Hits Ratio Updates PartialWrt
--------- --------- ---------- ---------- ------ ---------- ----------
32768 262144 77813570 77529130 99.6% 27943073 423524
Metadata Backing Caching Statistics
Buffers (K bytes) Requests Hits Ratio Discards
---------- --------- ---------- ---------- ----- ----------
131072 1048576 24303 377 1.5% 0
Transaction Cache Statistics
Transactions started: 7152165 Lookups on tran: 8032713 EC Merges: 516363
Allocated Transactions: 8034 (Act= 0, Pend= 0, Comp= 4846, Free= 3188)
I/O Summary By Type
-------------------
Count Waits Cancels Merges Type
---------- ---------- ---------- ---------- ----------
33006 7701 0 0 File System Metadata
680516 1020 0 56366 Log File
11 1 0 0 User File Data
I/O Summary By Circumstance
---------------------------
Count Waits Cancels Merges Circumstance
---------- ---------- ---------- ---------- ------------
7213 6553 0 0 Metadata cache read
1 1 0 0 User file cache direct read
4 4 0 0 Log file read
0 0 0 0 Metadata cache async delete write
0 0 0 0 Metadata cache async write
0 0 0 0 Metadata cache lazy write
0 0 0 0 Metadata cache sync delete write
0 0 0 0 Metadata cache sync write
10 0 0 0 User File cache direct write
1 1 0 0 Metadata cache file sync write
16981 861 0 0 Metadata cache sync daemon write
0 0 0 0 Metadata cache aggregate detach write
0 0 0 0 Metadata cache buffer block reclaim write
0 0 0 0 Metadata cache buffer allocation write
0 0 0 0 Metadata cache file system quiesce write
8811 286 0 0 Metadata cache log file full write
680512 1016 0 56366 Log file write
0 0 0 0 Metadata cache shutdown write
0 0 0 0 Format, grow write
zFS I/O by Currently Attached Aggregate
DASD PAV
VOLSER IOs Mode Reads K bytes Writes K bytes
Dataset Name
------ --- ---- ---------- ---------- ---------- ----------
------------
ZFSD18 1 R/W 44 344 1831 17224
ZFSAGGR.BIGZFS.DHH.FS14.EXTATTR
ZFS121 1 R/W 6509 52056 648750 10276788
ZFSAGGR.BIGZFS.DHH.FS1.EXTATTR
------ ---------- ---------- ---------- ----------
2 6553 52400 650581 10294012
*TOTALS*
Total number of waits for I/O: 8722
Average I/O wait time: 115.334 (msecs)
IOEZ00025I zFS kernel: MODIFY command - QUERY,LFS completed successfully
Table 1. LFS
report sectionsField name |
Contents |
---|
zFS Vnode Op Counts: |
Shows the number of calls to the lower layer
zFS components. One request from z/OS® UNIX typically requires more than
one lower-layer call. Note that the output of this report wraps. |
zFS Vnode Cache Statistics: |
Shows the zFS vnode cache statistics. It shows
the number of currently allocated vnodes and the vnode hit ratio. Allocates and "Deletes" show
requests to create new vnodes (for operations such as create or mkdir)
and delete vnodes (for operations such as remove or failed creates
or mkdirs). The size of this cache is controlled by the vnode_cache_size parameter
and the demand for zFS vnodes placed by z/OS UNIX. In general, zFS tries to honor
the setting of the vnode_cache_size parameter and
recycle vnode structures to represent different files. However,
if z/OS UNIX requests more vnodes than zFS has allocated
then zFS must allocate vnodes to avoid applications failing. Held zFS vnodes is the number
of vnodes that z/OS UNIX has required of zFS to currently
access. high is the largest number of vnodes
that z/OS UNIX required of zFS to access at one time (during
a peak time). z/OS UNIX also determines when files are to be opened
and closed. Open zFS vnodes is the number
of vnodes that represent currently open files. high
is the largest number of files open at the same time. Generally,
a good hit ratio for this cache is preferable because a miss means
initializing the data structures and initialization requires a read
of the object's status from disk. Often this is in the metadata cache,
but it is not guaranteed. Consequently a vnode cache lookup miss might
require an I/O wait.
The vnode structure size is shown; however,
additional data structures anchored from the vnode also take space.
Everything added together yields over 1 K of storage per vnode. Consider
this when planning the size of this cache. Also note that initializing
a vnode will not require an I/O if the object's status information
is in the metadata cache, thus a good size metadata cache can be as
useful—often more useful than an extremely large vnode cache.
Total osi_getvnode Calls is the number
of times zFS called the osi_getvnode interface of z/OS UNIX to
get a z/OS UNIX vnode to correspond to a new zFS vnode.
Its high resp is the number of calls that took longer
than a second to complete. Avg. Call Time is the
average number of milliseconds each call took to complete.
Total SAF Calls is the number of calls
zFS made to the security product via the SAF interface. high
resp is the number of these security calls that took longer
than a second to complete. Avg. Call Time is the
average number of milliseconds each call took to complete.
|
zFS Fast Lookup Statistics: |
Shows the basic performance characteristics
of the zFS fast lookup cache. The fast lookup cache is used on the
owning system for a zFS sysplex-aware file system to improve the performance
of the lookup operation. There are no externals for this cache (other
than this display). The statistics show the total number of buffers
(each are 8K in size), the total number of lookups, the cache hits
for lookups and the hit ratio. The higher the hit ratio, the better
the performance. |
Metadata Caching Statistics: |
Shows the basic performance characteristics
of the metadata cache. The metadata cache contains a cache of all
disk blocks that contain metadata and any file data for files less
than 7 K in size. For files smaller than 7 K, zFS places multiple
files in one disk block (for zFS a disk block is 8 K bytes). Only
the lower metadata management layers have the block fragmentation
information, so the user file I/O for small files is performed directly
through this cache rather than the user file cache. The statistics
show the total number of buffers (each are 8 K in size), the total
bytes, the request rates, hit ratio of the cache,
Updates (the number of times an update was made to a metadata block),
and Partial writes (the number of times that only half of an 8-K metadata
block needed to be written). The higher the hit ratio the better
the performance. Metadata is accessed frequently in zFS and all metadata
is contained only (for the most part) in the metadata cache therefore,
a hit ratio of 80% or more is typically sufficient.
|
Metadata Backing Cache Statistics: |
Describes the performance of the extension to
the metadata cache. The size of this extension is controlled by the metaback_cache_size
configuration option. The backing cache is stored in a dataspace and
is used only to avoid metadata reads from disk. All metadata updates
and write I/O are performed from the primary metadata cache. Similar
statistics to the metadata cache are shown for this cache. Every hit
in this cache avoids one disk read, but the metadata backing cache
is not needed except for workloads with many small user files or that
are constrained in the zFS primary address space (possibly because
of a large demand of zFS vnodes made by z/OS UNIX and its applications). Thus,
if the zFS address space has primary space available, the space should
be given to the primary metadata cache. In the preceding
report example, the metadata backing cache is providing little performance
benefit (as shown by its small hit ratio). It can only be created
by specifying themetaback_cache_size configuration
option of the IOEFSPRM file or the zfsadm config command. |
Transaction Cache Statistics: |
zFS updates metadata on disk by writing the
changes to the metadata to a log file. Each operation will create
one or more transactions, write the updates to the logs associated
with the transaction and then end the transaction. Each transaction
has an associated state, which is described as follows: - Active
- Records are still being written to the log file to describe the
updates being made by this transaction; hence, the transaction was
started but has not yet ended. Shown as "Act" in the report.
- Complete
- The transaction has ended, all updates were written to the log
file, and the end transaction record is also written to the log for
that transaction. Shown as Comp in the report.
- Committed
- The transaction has ended and all updates are written to the log
file and all the log file pages that contain information about this
transaction are on disk. At this point, the transaction is guaranteed.
The update is present if the system stopped. (In the sample report,
statistics for this count is not shown. As soon as a transaction is
committed, the structure representing the transaction is "free" for
reuse for another transaction.)
- Equivalence Classes
- zFS does not use a common technique called two-phase locking or
commit. Rather, transactions that are related are grouped into equivalence
classes. zFS will decide when a transaction is related to, or dependent
on, another transaction. When this determination is made, the transactions
are grouped into an equivalence class. Any transactions in the same
equivalence class are committed together, or backed out together,
in the event of a system failure. By using equivalence classes, threads
running transactions run in parallel without added serialization between
the two (other than locks if they hit common structures) and add their
associated transactions to the same class. This increases throughput.
The merge of equivalence classes occurs when two transactions that
need to be made equivalent are both already in equivalence classes.
In this case, both classes are merged "EC Merges".
- Pending
- A transaction is pending when all its updates are written to the
log file, but other transactions in its same equivalence class have
not ended. Shown as "Pend".
The transaction cache size is by default 2000 transactions.
It can be changed by the tran_cache_size configuration
option. In general, zFS increases the size of the cache if it determines
too many I/O waits are occurring to sync log file pages to commit
transactions so that their structure can be freed and thereby improve
performance. Also, if you are using the zfsadm config command
to set the tran_cache_size, the transaction cache
will not be shrunk too small as to cause excessive log file syncs
and you will see a failure if you attempt to set the cache too small.
As a rule of thumb, the default should be fine for most customers.
If zFS determines more are needed for performance, it will allocate
more. zFS is conservative about adding more transaction structures.
You might get a small performance boost by starting with a larger
transaction cache size, so that zFS does not need to make checks to
determine if it can increase the size or sync log file pages.
|
zFS I/O by Currently Attached Aggregate: |
The zFS I/O driver is essentially an I/O queue
manager (one I/O queue per DASD). It uses Media Manager to issue I/O
to VSAM data sets. It generally sends no more than one I/O per DASD
volume to disk at one time. The exception is parallel access volume
(PAV) DASD. These DASD often have multiple paths and can perform multiple
I/O in parallel. In this case, zFS will divide the number of access
paths by two and round any fraction up. (For example, for a PAV DASD
with five paths, zFS will issue, at the most, three I/Os at one time
to Media Manager). zFS limits the I/O because it uses a dynamic
reordering and prioritization scheme to improve performance by reordering
the I/O queue on demand. Thus, high priority I/Os (I/Os that are currently
being waited on, for example) are placed up front. An I/O can be made
high priority at any time during its life. This reordering has been
proven to provide the best performance, and for PAV DASD, performance
tests have shown that not sending quite as many I/Os as available
paths allows zFS to reorder I/Os and leave paths available for I/Os
that become high priority.
Another feature of the zFS I/O driver
is that by queueing I/Os, it allows I/Os to be canceled. For example,
this is done in cases where a file was written, and then immediately
deleted. Finally, the zFS I/O driver merges adjacent I/Os into one
larger I/O to reduce I/O scheduling resource, this is often done with
log file I/Os because often times multiple log file I/Os are in the
queue at one time and the log file blocks are contiguous on disk.
This allows log file pages to be written aggressively (making it less
likely that users lose data in a failure) and yet batched together
for performance if the disk has a high load.
This section contains
the following information: - PAV IO, which shows how many I/Os are sent in parallel to Media
Manager by zFS, non PAV DASD always shows the value 1.
- DASD VOLSER for the primary extent of each aggregate and the total
number of I/Os and bytes read/written.
- Number of times a thread processing a request must wait on I/O
and the average wait time in milliseconds is shown
By using this information with the KN report, you can
break down zFS response time into what percentage of the response
time is for I/O wait. To reduce I/O waits, you can run with larger
cache sizes. Small log files (small aggregates) that are heavily updated
might result in I/Os to sync metadata to reclaim log file pages resulting
in additional I/O waits. Note that this number is not DASD response
time. It is affected by it, but it is not the same. If a thread does
not have to wait for an I/O then it has no I/O wait; if a thread has
to wait for an I/O but there are other I/Os being processed, it might
actually wait for more than one I/O (the time in queue plus the time
for the I/O).
This report, along with RMF™ DASD reports and the zFS FILE report, can
be also used to balance zFS aggregates among DASD volumes to ensure
an even I/O spread.
|
zFS Large Fast Lookup Statistics
--------------------------------
Number of Large FLC Buffers allocated 60
Number of Large FLC Buffers stolen 18
Number of Large FLC Buffers assigned 326
Number of Large FLC Buffers requests 5989235
Number of Large FLC Buffers in use 0
Number of Large FLC Hash Table Slots 4096
Number of Pieces in Large FLC 32512
If you define the IOEPRMxx configuration option flc,
the LFS query report will include statistics for the new Large FLC
buffers. It contains information that includes the total number of
Large FLC buffers allocated, the number of times a buffer is stolen
from a directory, the number of times a buffer is assigned (and populated
by reading in the entire contents of the directory) to a directory,
the total number of operations (requests such as lookup, create or
remove) performed on a directory that has a large buffer assigned
to it, the size of the hash table in the buffer and how many name
pieces each buffer contains.
The number of buffers that are allocated is the number
of large buffers requested on the IOEFSPRM variable flc bufnum
setting.
The number of buffers stolen is the number of times
a buffer is stolen from a large directory. Stealing it can occur if z/OS UNIX
inactivates the directory, zFS steals the vnode extension from the
directory vnode, the containing file system is unmounted, the large
directory itself is deleted or if the large directory has not been
accessed in the number of seconds specified by the IOEPRMxx variable flc inactive
setting. If this number seems high, it might be because the directory
is not being accessed frequently. If it is being accessed frequently,
then try increasing the vnode cache size or increasing the IOEPRMxx
variable flc inactive value.
Number of buffers assigned is the number of times
a large directory that does not currently have a Large FLC buffer
assigned to it and has a size greater than or equal to IOEPRMxx variable flc mindirsize is
accessed. This causes a Large FLC buffer to be populated by reading
the contents of the entire directory from disk. If this number seems
high, it might be because the directory is not being accessed frequently.
If it is being accessed frequently, then try increasing the vnode
cache size or increasing the IOEPRMxx variable flc inactive
value. Another possible cause is that the IOEPRMxx variable flc mindirsize value
is too small. This could occur if, over time, directories have grown
in size and now more are considered to be large. If this has happened,
then increase the IOEPRMxx variable flc mindirsize value
to a more appropriate value or increase the IOEPRMxx variable flc bufnum to
allocate more Large FLC buffers. If this number seems low, you might
need to decrease the IOEPRMxx variable flc inactive
to allow for more frequent Large FLC buffer reuse.
The number of buffer requests is the number of lookup,
create, or remove directory operations done to a directory with a
Large FLC buffer. This number can be used as an indicator of how frequently
the large directories are being accessed.
The number of buffers in use is the number of Large
FLC buffers that are currently assigned to a large directory. If you
repeatedly find that this number is much lower than, or nearly the
same as, the number of allocated buffers then you should verify that
your IOEPRMxx variable flc bufnum
setting is correct. Directories that used to exist might have been
deleted or other (possibly new) directories might have grown to IOEPRMxx
variable flc mindirsize and are
now considered large.
Also included is the number of slots in the hash table
and the number of pieces in the Large FLC buffers. Message IOEZ00821E
indicates that a directory contains too many entries to have a Large
FLC buffer assigned to it. Message IOEZ00819E indicates that a directory
had a Large FLC buffer assigned to it, but a new entry was added to
the directory that caused it to become too big. These messages indicate
that the hash table and number of pieces are not sufficient. The IOEPRMxx
variable flc bufnum value needs
to be increased. This could have occurred because over time, the directories
grew larger than previously expected. It is very important to reevaluate
the IOEPRMxx variable flc settings periodically.
|