Prefetch
Prefetch fetches the file metadata (inode information) and data from home before an application requests the contents.
Prefetch is a feature that allows fetching the contents of a file into the cache before actual reads.
Prefetching files before an application starts can reduce the network delay when an application requests a file. Prefetch can be used to proactively manage WAN traffic patterns by moving files over the WAN during a period of low WAN usage.
- Populate metadata
- Populate data
- View prefetch statistics
mmafmctl Device prefetch -j FilesetName [-s LocalWorkDirectory]
[--retry-failed-file-list|--enable-failed-file-list]
[ {--directory LocalDirectoryPath | --dir-list-file DirListfile [--policy]} [--nosubdirs]]
[{--list-file ListFile | --home-list-file HomeListFile} [--policy]]
[--home-inode-file PolicyListFile]
[--home-fs-path HomeFilesystemPath]
[--metadata-only] [--gateway Node]
[--readdir-only] [--force] [--prefetch-threads nThreads]
For
more information about the command, see mmafmctl command. If no options are given for prefetch,
the statistics of the last prefetch command that is run on the fileset are displayed.--metadata-only
- Prefetches only the metadata and not the actual data. This option is useful in migration scenarios. This option requires the list of files whose metadata is to be populated. It must be combined with a list file option.
--list-file ListFile
- The specified file contains a list of files that need to be pre-populated, one file per line.
All files must have fully qualified path names. If the list of files to be prefetched have
file names with special characters, then a policy must be used to generate the
listfile
. This list file must be edited manually to remove all other entries except the file names. The list of files can be:- Files with fully qualified names from cache
- Files with fully qualified names from home
- List of files from the home that are generated by using the policy. The file must not be edited.
--enable-failed-file-list
- Turns on generating a list of files that failed during prefetch operation at the gateway node. The list of files is saved as .afm/.prefetchedfailed.list under the fileset. Failures that occur during processing are not logged in .afm/.prefetchedfailed.list. If you observe any errors during processing (before queuing), you might need to correct the errors and rerun prefetch.
--policy
- Specifies that the
list-file
orhome-list-file
is generated by using a GPFS Policy by which sequences like '\' or '\n' are escaped as '\\' and '\\n'. If this option is specified, input file list is treated as already escaped. The sequences are unescaped first before queuing for prefetch operation.Note: This option can be used only if you are specifyinglist-file
orhome-list-file
. --directory LocalDirectoryPath
- Specifies path to the local directory from which you want to prefetch files. A list of all files
in this directory and all its subdirectories is generated, and queued for prefetch. You can either
specify
--directory
or--dir-list-file
with mmafmctl prefetch. The--policy
option can be use only with--dir-list-file
and not with--directory
.For example,# mmafmctl fs1 prefetch -j fileset1 --dir-list-file /tmp/file1 --policy
The following example includes methods to name a directory for the --directory option, when the directory name contains special characters:- When a directory name does not have terminal escape sequences, keep the absolute directory path
within double quotation marks ("
").
A sample output is as follows:# mmafmctl fs2 prefetch -j roTestPrefetch_GPFS --directory"/gpfs/fs2/roTestPrefetch1/Dir_a\!h@j#k%l^k&78*9'\''"
mmafmctl: Performing prefetching of fileset: roTestPrefetch_GPFS mmafmctl(2020-04-13 02:35:39): Listing all files of directory "/gpfs/fs2/roTestPrefetch1/Dir_a\!h@j#k%l^k&78*9'\''" Queued Failed TotalData (approx in Bytes) 25 0 131072000 prefetch successfully queued at the gateway. (2020-04-13 02:35:41): Listed all files of directory "/gpfs/fs2/roTestPrefetch1/Dir_a\!h@j#k%l^k&78*9'\''"
- When the directory name has terminal escape sequences, do not keep the directory path within
double quotation marks. The terminal auto-fills the escape sequences in the directory name when you
press the <Tab> two
times.
A sample output is as follows:# mmafmctl fs2 prefetch -j roTestPrefetch_GPFS --directory /gpfs/fs2/roTestPrefetch1/Dir_a\\\!h\@j#k%l\^k\&78\*9\'\\\'\'/
mmafmctl: Performing prefetching of fileset: roTestPrefetch_GPFS mmafmctl(2020-04-13 02:39:58): Listing all files of directory "/gpfs/fs2/roTestPrefetch1/Dir_a\!h@j#k%l^k&78*9'\''/" Queued Failed TotalData (approx in Bytes) 25 0 131072000 prefetch successfully queued at the gateway. mmafmctl(2020-04-13 02:40:00): Listed all files of directory "/gpfs/fs2/roTestPrefetch1/Dir_a\!h@j#k%l^k&78*9'\''/"
- When press <Tab> two times to include the escape sequences a directory name and keep the
directory path within double quotation marks, the prefetch operation fails. The prefetch operation
fails because the unescape of terminal escaped characters in the directory name is not
performed.
A sample output is as follows:# mmafmctl fs2 prefetch -j roTestPrefetch_GPFS --directory "/gpfs/fs2/roTestPrefetch1/Dir_a\\\!h\@j#k%l\^k\&78\*9\'\\\'\'/"
mmafmctl: Performing prefetching of fileset: roTestPrefetch_GPFS runPrepopSubcommand: Unexpected error from missing or incorrect prepop input path. Return code: 1 mmafmctl: Command failed. Examine previous error messages to determine cause.
- When a directory name does not have terminal escape sequences, keep the absolute directory path
within double quotation marks ("
").
--dir-list-file DirListFile
- This parameter enables prefetching individual directories under AFM fileset. Input file
specifies the unique path to a directory that you want to prefetch. AFM generates a list of files
under the specified directory and subdirectories and queues it to the gateway Node. The input file
can also be a policy-generated file for which you need to specify
--policy
--nosubdirs
- This option restricts the recursive behavior of
--directory
and--dir-list-file
and prefetches only until the specified level of directory. If you specify this parameter, subdirectories under the directory are not prefetched. This parameter is optional and can be used only with--directory
and--dir-list-file
.For example,# mmafmctl fs1 prefetch -j fileset1 --directory /gpfs/fs1/fileset1/dir1 --nosubdirs
# mmafmctl fs1 prefetch -j fileset1 --dir-list-file /tmp/file1 --policy --nosubdirs
--retry-failed-file-list
- Allows retrying prefetch of files that failed in the last prefetch operation. The list of files
to retry is obtained from .afm/.prefetchedfailed.list under the fileset.Note: To use this option, you must enable generating a list of failed files. Add --enable-failed-file-list to the command first.
--home-list-file HomeListFile
- The specified file contains a list of files from home that need to be pre-populated, one file
per line. All files must have fully qualified path names. If the list of files to be prefetched have
file names with special characters, then a policy must be used to generate the
listfile
. A policy-generated file must be edited manually to remove all other entries except the file names. As of version 4.2.1, this option is deprecated. The –-list-file option removes all other entries except the file names. --home-inode-file PolicyListFile
- The specified file contains the list of files from home that need to be pre-populated in the cache and this file is generated by using policy. This file must not be edited manually. This option is deprecated. The –list-file option removes all other entries except the file names.
--home-fs-path HomeFileSystemPath
- Specifies the full path to the fileset at the home cluster and can be used along with
-list-file. You must use this option, when in the NSD protocol the mount point
on the gateway nodes of the afmTarget filesets does not match the mount point
on the Home cluster. For example, the home file system is mounted on the home cluster at
/gpfs/homefs1. The home file system is mounted on the cache
by using NSD protocol at /gpfs/remotefs1.For example,
# mmafmctl gpfs1 prefetch -j cache1 –list-file /tmp/list.allfiles --home-fs-path /gpfs/remotefs1
--readdir-only
- Enables
readdir
operation on a dirty directory at the cache one last time and brings latest directory entries.This option helps prefetching modified directory entries from the home, although the directory at the cache fileset was modified by the applications and AFM marked the dirty flag on the cache directory. This option overrides the dirty flag that is set when the data is modified at the local LU cache. In the LU mode, the dirty flag does not allow the
readdir
operation at the home and refreshes the directory file entries from the home.This option helps in the migration process where new files were created at the home after the application was moved to the cache. The application already modified the directory and refresh intervals were disabled. AFM queues
readdir
one last time on the cache directory and brings entries of the created files to the cache.The afmReadDirOnce parameter must be set on an AFM fileset, and directory and files refresh intervals must be disabled.
For example,- To set afmRefreshOnce on an AFM fileset, issue the following
command:
# mmchfileset fs fileset -p afmRefreshOnce=yes
- To check whether the afmRefreshOnce parameter value is set on an AFM
fileset, issue the following
command:
A sample output is as follows:# mmlsfileset fs fileset -L --afm
Filesets in file system '<fs>': Attributes for fileset <fileset>: ========================================== Status Linked Path GPFS_PATH/fileset Id 37 Root inode 3145731 Parent Id 0 Created Wed Mar 4 12:23:47 2020 Comment Inode space 6 Maximum number of inodes 100352 Allocated inodes 100352 Permission change flag chmodAndSetacl afm-associated Yes Target nfs://home/fileset Mode local-updates File Lookup Refresh Interval 30 (default) File Open Refresh Interval 30 (default) Dir Lookup Refresh Interval 60 (default) Dir Open Refresh Interval 60 (default) Expiration Timeout disable (default) Last pSnapId 0 Display Home Snapshots yes (default) Number of Gateway Flush Threads 4 Prefetch Threshold 0 (default) Eviction Enabled yes (default) IO Flags 9216 (refreshOnce
- To run the prefetch operation for the
readdir
operation one last time, issue following command:# mmafmctl fs prefetch -j fileset --directory /fileset_path/directory --readdir-only
- To set afmRefreshOnce on an AFM fileset, issue the following
command:
--force
- Enables forcefully fetching data from the home during the migration process. This option
overrides any set restrictions and helps to fetch the data forcefully to the cache. This option must
be used only to forcefully fetch the data that was created after the migration process
completion.For example,
# mmafmctl fs prefetch -j fileset --list-file listfile_path --force
--gateway Node
- Allows selecting the gateway node that can be used to run the prefetch operation on a fileset,
which is idle or less-utilized. This option helps to distribute the prefetch work on different
gateway nodes and overrides the default gateway node, which is assigned to the fileset. It also
helps to run different prefetch operations on different gateway nodes, which might belong to the
same fileset or a different fileset.For example,
# mmafmctl fs prefetch -j fileset --list-file listfile_path --gateway Node2
--prefetch-threads nThreads
- Specifies the number of threads to be used for the prefetch operation. Valid values are 1 - 255.
Default value is 4.For example,
# mmafmctl fs prefetch -j fileset --list-file listfile_path --prefetch-threads 6
Prefetch is an asynchronous process and the fileset can be used while prefetch is in progress. Prefetch completion can be monitored by using the afmPrepopEnd callback event or looking at mmafmctl Device prefetch command with no options.
Prefetch pulls the complete file contents from home (unless the ––metadata-only flag is used), so the file is designated as cached when it is prefetched. Prefetch of partially cached files caches the complete file.
Prefetch can be run in parallel on multiple filesets, although only one prefetch job can run on a fileset.
While a file is getting prefetched, it is not evicted.
If parallel data transfer is configured, all gateways participate in the prefetch process.
If the file system unmounts during prefetch on the gateway, issue the prefetch again.
Prefetch can be triggered on inactive filesets.
Directories are also prefetched to the cache if specified in the prefetch file. If you specify a directory in the prefetch file and if that directory is empty, the empty directory is prefetched to cache. If the directory contains files or subdirectories, you must specify the names of the files or subdirectories that you want to prefetch. If you do not specify names of individual files or subdirectories inside a directory, that directory is prefetched without its contents.
If you run the prefetch command with data or metadata options, statistics like queued files, total files, failed files, total data (in bytes) is displayed.
# mmafmctl FileSystem prefetch -j fileset --enable-failed-file-list --list-file /tmp/file-list
A
sample output is as
follows:
mmafmctl: Performing prefetching of fileset: <fileset>
Queued (Total) Failed TotalData (approx in Bytes)
0 (56324) 0 0
5 (56324) 2 1353559
56322 (56324) 2 14119335
Wed Oct 1 13:59:22.780 2014: [I] AFM: Prefetch recovery started for the file system gpfs1 fileset iw1.
mmafmctl: Performing prefetching of fileset: iw1
Wed Oct 1 13:59:23 EDT 2014: mmafmctl: [I] Performing prefetching of fileset: iw1
Wed Oct 1 14:00:59.986 2014: [I] AFM: Starting 'queue' operation for fileset 'iw1' in filesystem '/dev/gpfs1'.
Wed Oct 1 14:00:59.987 2014: [I] Command: tspcache /dev/gpfs1 1 iw1 0 257 42949 67295 0 0 1393371
Wed Oct 1 14:01:17.912 2014: [I] Command: successful tspcache /dev/gpfs1 1 iw1 0 257 4294967295 0 0 1393371
Wed Oct 1 14:01:17.946 2014: [I] AFM: Prefetch recovery completed for the filesystem gpfs1 fileset iw1. error 0
- Metadata population by using prefetch:
A sample output is as follows:# mmafmctl fs1 getstate -j ro
Fileset Name Fileset Target Cache State Gateway Node Queue Length Queue numExec ------------ -------------- ----------- ------------ ------------ ------------- ro nfs://c26c3apv1/gpfs/homefs1/dir3 Active c26c4apv1 0 7 List Policy: RULE EXTERNAL LIST 'List' RULE 'List' LIST 'List' WHERE PATH_NAME LIKE'%' Run the policy at home:mmapplypolicy /gpfs/homefs1/dir3 -P px -f px.res -L 1 -N mount -I defer Policy creates a file which should be manually edited to retain only the file names. Thereafter this file is used at the cache to populate metadata.
A sample output is as follows:# mmafmctl fs1 prefetch -j ro --metadata-only --list-file=px.res.list.List
mmafmctl: Performing prefetching of fileset: ro Queued (Total) Failed TotalData (approx in Bytes) 0 (2) 0 0 100 (116) 5 1368093971 116 (116) 5 1368093971 prefetch successfully queued at the gateway Prefetch end can be monitored by using this event: Thu May 21 06:49:34.748 2015: [I] Calling User Exit Script prepop: event afmPrepopEnd, Async command prepop.sh. The statistics of the last prefetch command can be viewed by running the following command: mmafmctl fs1 prefetch -j ro Fileset Async Read Async Read Async Read Async Read Async Read Name (Pending) (Failed) (Already Cached) (Total) (Data in Bytes) ------- ---------- ---------- ------------------ ----------- ---------------- ro 0 1 0 7 0
- Prefetch of data by giving list of files from
home:
A sample output is as follows:# cat /listfile1
/gpfs/homefs1/dir3/file1 /gpfs/homefs1/dir3/dir1/file1
A sample output is as follows:# mmafmctl fs1 prefetch -j ro --list-file=/listfile1
mmafmctl: Performing prefetching of fileset: ro Queued (Total) Failed TotalData (approx in Bytes) 0 (2) 0 0 2 (2) 0 1368093971
A sample output is as follows:# mmafmctl fs1 prefetch -j ro
Fileset Async Read Async Read Async Read Async Read Async Read Name (Pending) (Failed) (Already Cached) (Total) (Data in Bytes) ------- ---------- ---------- ------------------ ----------- ---------------- ro 0 0 0 2 122880
- Prefetch of data by using a list file, which is generated by using policy at home:
Inode file is created by using the policy at home, and must be used without editing manually.
List Policy: RULE EXTERNAL LIST 'List' RULE 'List' LIST 'List' WHERE PATH_NAME LIKE '%'
For files with special characters, path names must be encoded with
ESCAPE %
.RULE EXTERNAL LIST 'List' ESCAPE '%' RULE 'List' LIST 'List' WHERE PATH_NAME LIKE '%'
Run the policy at home:
# mmapplypolicy /gpfs/homefs1/dir3 -P px -f px.res -L 1 -N mount -I defer
A sample output is as follows:# cat /lfile2
113289 65538 0 -- /gpfs/homefs1/dir3/file2 113292 65538 0 -- /gpfs/homefs1/dir3/dir1/file2
A sample output is as follows:#mmafmctl fs1 prefetch -j ro –list-file=/lfile2
mmafmctl: Performing prefetching of fileset: ro # mmafmctl fs1 prefetch -j ro –list-file=/lfile2 mmafmctl: Performing prefetching of fileset: ro Queued (Total) Failed TotalData (approx in Bytes) 0 (2) 0 0 2 (2) 0 1368093971
- Prefetch by using --home-fs-path option for a target with the NSD
protocol:
A sample output is as follows:# mmafmctl fs1 getstate -j ro2
Fileset Name Fileset Target Cache State Gateway Node Queue Length Queue numExec ------------ -------------- ----------- ------------ ------------ ------------- ro2 gpfs:///gpfs/remotefs1/dir3 Active c26c4apv1 0 7
A sample output is as follows:# cat /lfile2
113289 65538 0 -- /gpfs/homefs1/dir3/file2 113292 65538 0 -- /gpfs/homefs1/dir3/dir1/file2
A sample output is as follows:# mmafmctl fs1 prefetch -j ro2 –list-file=/lfile2 --home-fs-path=/gpfs/homefs1/dir3
mmafmctl: Performing prefetching of fileset: ro2 Queued (Total) Failed TotalData (approx in Bytes) 0 (2) 0 0 2 (2) 0 113292
A sample output is as follows:# mmafmctl fs1 prefetch -j ro2
Fileset Async Read Async Read Async Read Async Read Async Read Name (Pending) (Failed) (Already Cached) (Total) (Data in Bytes) ------- ---------- ---------- ------------------ ----------- ---------------- ro2 0 0 0 2 122880