zFS Version 1.5 Aggregate and Extended (v5) Directory
Beginning in z/OS V2R1, zFS supports a new version aggregate, the version 1.5 aggregate. Version 1.5 aggregates support extended (v5) directories. Extended (v5) directories provide the following benefits:
- They can support larger directories with performance.
- They store names more efficiently than v4 directories.
- When names are removed from extended (v5) directories, the space is reclaimed, when possible, unlike v4 directories where space is not reclaimed until the directory is removed.
Version 1.5 aggregates have a larger architected maximum size than version 1.4 aggregates (approximately 16 TB versus approximately 4 TB). Extended (v5) directories can support more subdirectories than v4 directories (4G-1 versus 64K-1). Because version 1.5 aggregates will benefit all z/OS V2R1 or later environments, you are encouraged to use this function only after all of your systems have been migrated to z/OS V2R1 or a later z/OS version. Version 1.5 aggregates can contain both extended (v5) directories and v4 directories and either can be a subdirectory of the other. zFS version 1.4 aggregates cannot contain extended (v5) directories. Version 1.5 aggregates can be mounted on directories contained in version 1.4 aggregates, and the reverse is also allowed.
Only zFS version 1.5 aggregates can have extended (v5) directories. An existing zFS version 1.4 aggregate can be converted on z/OS V2R1 to a version 1.5 zFS aggregate, which allows it to contain both v4 (if existed before the conversion) and extended (v5) directories. Any new directories created will be an extended (v5) directory. Individual directories in a zFS version 1.4 aggregate can be converted from v4 to extended (v5) directories. This will cause an automatic conversion of the aggregate from version 1.4 to version 1.5, if not at the version 1.5 format, and can only be performed on z/OS V2R1 or later releases. New directories created in the version 1.5 aggregate will be extended (v5) directories.
Here are some ways existing v4 directories can be converted to extended (v5) directories:
- Explicitly, one at a time, for a mounted aggregate using the zfsadm convert -path command, or
- Automatically, as they are accessed, for a mounted aggregate when the aggregate has the converttov5 attribute, or
- Offline, converting all directories using the IOEFSUTL converttov5 batch utility.
Existing directories in a version 1.5 aggregate are not automatically converted if the NOCONVERTTOV5 MOUNT PARM is specified. Explicit and offline directory conversion will change the aggregate from version 1.4 to version 1.5, if necessary.
Note: Converting a directory from version v4 to an extended (v5) directory requires both versions of the directory to exist on disk at the same time, temporarily. If the aggregate becomes full during the allocation of the new directory a dynamic grow will be attempted.
Some suggested guidelines for v4 to v5 conversions are:
- Extended (v5) directories have better performance than v4 directories of the same size. For optimal performance after all systems at your site have been migrated to z/OS V2R1, all of the directories should be converted from v4 to v5 even though support will continue to be provided for v4 directories. To convert selected file systems or directories, you can use automatic methods (such as specifying the MOUNT parameters or by using the offline conversion utility). You can also convert them explicitly with the zfsadm convert command.
- If your installation exports zFS file systems to NFS or SMB, it is recommended that the zfsadm convert command not be used for conversions for directories that are exported by these servers. In rare cases, remote applications can get unexpected errors if a directory being manually converted is simultaneously being accessed by NFS or SMB users. Use one of the other methods for the conversion, such as offline conversion or the CONVERTTOV5 MOUNT parameter, for these file systems. These methods will ensure that each individual directory is completely converted before it can be exported.
- If you are not planning to convert all file systems to v5, then it is best to at least do the most active file systems or the file systems with large directories. A directory will get a nontrivial benefit by conversion to v5 if it has 10000 entries or more (a length of approximately 800 K or more). You can determine the most active file systems by issuing MODIFY ZFS,QUERY,FILESETS or by using the wjsfsmon tool. The number of entries in a directory or its size can be determined by issuing the commands df –t, ls –ld, ls –l, find, or the largedir utility. The approximate rate of conversion for the directories is between 3500 (for a z9® machine) and 10000 (for a zEC12 machine) directory entries per second, depending on your processor.
- After you decide that a file system (aggregate) is going to be converted to version 1.5, you need to decide what conversion method to use. If the file system can be unmounted, the IOEFSUTL converttov5 batch utility or MOUNT parameters can be used. If it cannot be unmounted and it is not exported by NFS or SMB servers, use the zfsadm convert command. If it is exported by NFS or SMB servers, add the converttov5 attribute to the mounted aggregate.
Care should be taken to make an informed decision before implementing a zFS version 1.5 aggregate in your environment. The main purpose of the version 1.5 aggregate is to support a new directory format (extended (v5) directory) that will scale better when the directory contains many objects. Since the format of a new directory is different in a version 1.5 aggregate, zFS provides toleration APAR OA39466 to be installed on earlier than z/OS V2R1 supported releases. This will cause a mount of a version 1.5 aggregate on an earlier than z/OS V2R1 release to fail and resort to z/OS UNIX function shipping. Earlier than z/OS V2R1 releases cannot locally access an extended (v5) directory or version 1.5 aggregate. Consider not converting your root file system in the sysplex until your entire environment is at z/OS V2R1 or later.
In z/OS V2R1 zFS has a new zfsadm command called fileinfo (zfsadm fileinfo). It can be used to display information about a file or directory. Fileinfo can be used to determine if the directories in a file system are extended (v5) directories or version v4 directories.
Also note that the default for the zFS IOEPRMxx option CONVERT_AUDITFID is being changed from OFF to ON. This means that any existing zFS aggregates that have the non-unique auditfid will be converted to have a unique auditfid on the next read-write mount.
The following was performed in our parallel sysplex z/OS V2R1 environment.
Determining Directory Candidates for Conversion
If you have long response times, you may have a first indication that you might have a directory size problem. An examination of the output of the MODIFY ZFS,QUERY,KN operator command or the z/OS UNIX zfsadm query –knpfs command can be performed. Look at the Avg Time field on the lines for operations that require zFS to search through names of a directory (for example, zfs_lookup, zfs_create, or zfs_remove). Typically, the average times should be on the order of a few milliseconds. If they are relatively large (perhaps ten to a hundred times larger than that), it is possible that you have a directory that is too large and is causing performance problems.
Extended (v5) directories will scale better when there are many objects in the directory. Below are some ways we determined the directory size. These are only examples and may take some time to complete. You should determine the best approach to take in your environment.
We have a large version 1.4 aggregate mounted as RWSHARE in our environment.
Using the zfsadm aggrinfo and df commands we see that the file system is large.
$ zfsadm aggrinfo -aggregate OMVSSPN.VER15.ZFS -long
OMVSSPN.VER15.ZFS (R/W COMP): 26600551 K free out of total 59261760
auditfid E2E2F0F0 F0F6AB7C 0020
3325068 free 8k blocks; 7 free 1K fragments
32800 K log file; 80 K filesystem table
8192 K bitmap file
$ df -vkP /zfspetmounts/zfsver15A/
Filesystem 1024-blocks Used Available Capacity Mounted on
OMVSSPN.VER15.ZFS 59261760 32661209 26600551 56% /zfspetmounts/zfsver15A
ZFS, Read/Write, Device:7776, ACLS=Y
File System Owner : TPN Automove=Y Client=N
Filetag : T=off codeset=0
Aggregate Name : OMVSSPN.VER15.ZFS
Using the ‘df –t’ command, we see that there are many allocated file slots in this file system.
$ df -t /zfspetmounts/zfsver15A
Mounted on Filesystem Avail/Total Files Status
/zfspetmounts/zfsver15A (OMVSSPN.VER15.ZFS) 53201102/118523520 4291102111/4294967295 Available
One way to see if there are large directories in the file system that may be candidates for conversion to extended (v5) directories is to use the largedir utility. You can use the largedir.pl utility, which currently requires perl, to help determine which zFS directories are large (1MB or greater and 3MB or greater). The largedir.pl utility is available on the z/OS UNIX System Services Tools and Toys Web page as zFS Large Directory Utility (http://www.ibm.com/systems/z/os/zos/features/unix/bpxa1ty2.html).
You can also issue a ‘find’ command to search for file system directories that are greater than 1MB.
An example would be:
$ find /zfspetmounts/zfsver15A -type d -xdev -size +1048575c
We issued the largedir utility (located in our environment at /local/largedir/largedir) against the mountpoint that the file system was mounted on. It resulted in the following. The Minor Exceptions are over 1MB and the Major Exceptions are over 3MB.
$ /local/largedir/largedir /zfspetmounts/zfsver15A
Minor Exception: Large Directory: /zfspetmounts/zfsver15A/JA0/D3
Minor Exception: Large Directory: /zfspetmounts/zfsver15A/JE0/D3
Minor Exception: Large Directory: /zfspetmounts/zfsver15A/JJ0/D2
Major Exception: Really Large Directory: /zfspetmounts/zfsver15A/JJ0/D3
Major Exception: Really Large Directory: /zfspetmounts/zfsver15A/JH0/D0
Major Exception: Really Large Directory: /zfspetmounts/zfsver15A/JH0/D1
Major Exception: Really Large Directory: /zfspetmounts/zfsver15A/JD0/D0
Major Exception: Really Large Directory: /zfspetmounts/zfsver15A/JD0/D1
Major Exception: Really Large Directory: /zfspetmounts/zfsver15A/JD0/D2
Major Exception: Really Large Directory: /zfspetmounts/zfsver15A/JD0/D3
Major Exception: Really Large Directory: /zfspetmounts/zfsver15A/JG0/D2
Major Exception: Really Large Directory: /zfspetmounts/zfsver15A/JG0/D3
Major Exception: Really Large Directory: /zfspetmounts/zfsver15A/JF0/D0
Major Exception: Really Large Directory: /zfspetmounts/zfsver15A/J80/D3
Major Exception: Really Large Directory: /zfspetmounts/zfsver15A/Z0/D0
Major Exception: Really Large Directory: /zfspetmounts/zfsver15A/Z0/D1
Major Exception: Really Large Directory: /zfspetmounts/zfsver15A/Z0/D2
Major Exception: Really Large Directory: /zfspetmounts/zfsver15A/Z0/D3
Major Exception: Really Large Directory: /zfspetmounts/zfsver15A/J90/D1
Major Exception: Really Large Directory: /zfspetmounts/zfsver15A/J90/D2
Major Exception: Really Large Directory: /zfspetmounts/zfsver15A/J90/D3
Major Exception: Really Large Directory: /zfspetmounts/zfsver15A/JL0/D0
Major Exception: Really Large Directory: /zfspetmounts/zfsver15A/JL0/D1
Major Exception: Really Large Directory: /zfspetmounts/zfsver15A/JL0/D2
Major Exception: Really Large Directory: /zfspetmounts/zfsver15A/JL0/D3
Major Exception: Really Large Directory: /zfspetmounts/zfsver15A/JB0/D1
Major Exception: Really Large Directory: /zfspetmounts/zfsver15A/JB0/D2
Major Exception: Really Large Directory: /zfspetmounts/zfsver15A/JB0/D3
Using the ls –ld command, we see that the directories are greater than 1MB and contain many names (or at one time contained many names).
Here are some examples:
$ ls -ld /zfspetmounts/zfsver15A/JA0/D3
drwx------ 2 ALEASE1 sys1 1187840 Sep 25 23:35 /zfspetmounts/zfsver15A/JA0/D3
$ ls -ld /zfspetmounts/zfsver15A/JJ0/D3
drwx------ 2 ALEASE1 sys1 4898816 Sep 25 22:26 /zfspetmounts/zfsver15A/JJ0/D3
Using the ls –l and word count (wc) commands, we can see that there are currently many files in the directory and the approximate byte count.
Here are some examples (we filtered out the Total statement from the ls command output to get an accurate count, knowing that we did not have files with names that contain otal):
$ ls -l /zfspetmounts/zfsver15A/JA0/D3 | grep -v 'otal' | wc -lc
$ ls -l /zfspetmounts/zfsver15A/JJ0/D3 | grep -v 'otal' | wc -lc
See the following entries for some of our experiences and testing with this item:
zFS Version 1.5 Aggregate and Extended (v5) Directory Experience Part1
zFS Version 1.5 Aggregate and Extended (v5) Directory Experience Part2
zFS Version 1.5 Aggregate and Extended (v5) Directory Experience Part3
zFS Version 1.5 Aggregate and Extended (v5) Directory Experience Part4