Prefetch

Prefetch fetches the file metadata (inode information) and data from home before an application requests the contents.

Prefetch is a feature that allows fetching the contents of a file into the cache before actual reads.

Prefetching files before an application starts can reduce the network delay when an application requests a file. Prefetch can be used to proactively manage WAN traffic patterns by moving files over the WAN during a period of low WAN usage.

Prefetch can be used to do the following tasks:
  • Populate metadata
  • Populate data
  • View prefetch statistics
Use the following command to perform these activities:
mmafmctl Device prefetch -j FilesetName [-s LocalWorkDirectory]
             [--retry-failed-file-list|--enable-failed-file-list] 
             Start of change[ {--directory LocalDirectoryPath | --dir-list-file DirListfile [--policy]} [--nosubdirs]]
             [{--list-file ListFile | --home-list-file HomeListFile} [--policy]]
             [--home-inode-file PolicyListFile]
             [--home-fs-path HomeFilesystemPath]
             [--metadata-only] [--gateway Node]
             [--readdir-only] [--force] [--prefetch-threads nThreads]End of change
                          
For more information about the command, see mmafmctl command. If no options are given for prefetch, the statistics of the last prefetch command that is run on the fileset are displayed.
--metadata-only
Prefetches only the metadata and not the actual data. This option is useful in migration scenarios. This option requires the list of files whose metadata is to be populated. It must be combined with a list file option.
--list-file ListFile
The specified file contains a list of files that need to be pre-populated, one file per line. All files must have fully qualified path names. If the list of files to be prefetched have file names with special characters, then a policy must be used to generate the listfile. This list file must be edited manually to remove all other entries except the file names. The list of files can be:
  1. Files with fully qualified names from cache
  2. Files with fully qualified names from home
  3. List of files from the home that are generated by using the policy. The file must not be edited.
--enable-failed-file-list
Turns on generating a list of files that failed during prefetch operation at the gateway node. The list of files is saved as .afm/.prefetchedfailed.list under the fileset. Failures that occur during processing are not logged in .afm/.prefetchedfailed.list. If you observe any errors during processing (before queuing), you might need to correct the errors and rerun prefetch.
--policy
Specifies that the list-file or home-list-file is generated by using a GPFS Policy by which sequences like '\' or '\n' are escaped as '\\' and '\\n'. If this option is specified, input file list is treated as already escaped. The sequences are unescaped first before queuing for prefetch operation.
Note: This option can be used only if you are specifying list-file or home-list-file.
--directory LocalDirectoryPath
Specifies path to the local directory from which you want to prefetch files. A list of all files in this directory and all its subdirectories is generated, and queued for prefetch. You can either specify --directory or --dir-list-file with mmafmctl prefetch. The --policy option can be use only with --dir-list-file and not with --directory.
For example,
# mmafmctl fs1 prefetch -j fileset1 --dir-list-file /tmp/file1 --policy
Start of changeThe following example includes methods to name a directory for the --directory option, when the directory name contains special characters:
  • When a directory name does not have terminal escape sequences, keep the absolute directory path within double quotation marks (" ").
    # mmafmctl fs2 prefetch -j roTestPrefetch_GPFS --directory"/gpfs/fs2/roTestPrefetch1/Dir_a\!h@j#k%l^k&78*9'\''"
    A sample output is as follows:
    mmafmctl: Performing prefetching of fileset: roTestPrefetch_GPFS 
    mmafmctl(2020-04-13 02:35:39): Listing all files of directory "/gpfs/fs2/roTestPrefetch1/Dir_a\!h@j#k%l^k&78*9'\''"
         Queued	    Failed	          TotalData
               	          	  (approx in Bytes)
             25	         0	          131072000
    prefetch successfully queued at the gateway.
    (2020-04-13 02:35:41): Listed all files of directory "/gpfs/fs2/roTestPrefetch1/Dir_a\!h@j#k%l^k&78*9'\''"
  • When the directory name has terminal escape sequences, do not keep the directory path within double quotation marks. The terminal auto-fills the escape sequences in the directory name when you press the <Tab> two times.
    # mmafmctl fs2 prefetch -j roTestPrefetch_GPFS --directory /gpfs/fs2/roTestPrefetch1/Dir_a\\\!h\@j#k%l\^k\&78\*9\'\\\'\'/
    A sample output is as follows:
    mmafmctl: Performing prefetching of fileset: roTestPrefetch_GPFS 
    mmafmctl(2020-04-13 02:39:58): Listing all files of directory "/gpfs/fs2/roTestPrefetch1/Dir_a\!h@j#k%l^k&78*9'\''/"
         Queued	    Failed	          TotalData
               	          	  (approx in Bytes)
             25	         0	          131072000
    prefetch successfully queued at the gateway.
    mmafmctl(2020-04-13 02:40:00): Listed all files of directory "/gpfs/fs2/roTestPrefetch1/Dir_a\!h@j#k%l^k&78*9'\''/"
  • When press <Tab> two times to include the escape sequences a directory name and keep the directory path within double quotation marks, the prefetch operation fails. The prefetch operation fails because the unescape of terminal escaped characters in the directory name is not performed.
    # mmafmctl fs2 prefetch -j roTestPrefetch_GPFS --directory "/gpfs/fs2/roTestPrefetch1/Dir_a\\\!h\@j#k%l\^k\&78\*9\'\\\'\'/"
    A sample output is as follows:
    mmafmctl: Performing prefetching of fileset: roTestPrefetch_GPFS 
    runPrepopSubcommand: Unexpected error from missing or incorrect prepop input path.  Return code: 1
    mmafmctl: Command failed. Examine previous error messages to determine cause.
End of change
--dir-list-file DirListFile
This parameter enables prefetching individual directories under AFM fileset. Input file specifies the unique path to a directory that you want to prefetch. AFM generates a list of files under the specified directory and subdirectories and queues it to the gateway Node. The input file can also be a policy-generated file for which you need to specify --policy
--nosubdirs
This option restricts the recursive behavior of --directory and --dir-list-file and prefetches only until the specified level of directory. If you specify this parameter, subdirectories under the directory are not prefetched. This parameter is optional and can be used only with --directory and --dir-list-file.
For example,
# mmafmctl fs1 prefetch -j fileset1 --directory /gpfs/fs1/fileset1/dir1 --nosubdirs
# mmafmctl fs1 prefetch -j fileset1 --dir-list-file /tmp/file1 --policy --nosubdirs
--retry-failed-file-list
Allows retrying prefetch of files that failed in the last prefetch operation. The list of files to retry is obtained from .afm/.prefetchedfailed.list under the fileset.
Note: To use this option, you must enable generating a list of failed files. Add --enable-failed-file-list to the command first.
--home-list-file HomeListFile
The specified file contains a list of files from home that need to be pre-populated, one file per line. All files must have fully qualified path names. If the list of files to be prefetched have file names with special characters, then a policy must be used to generate the listfile. A policy-generated file must be edited manually to remove all other entries except the file names. As of version 4.2.1, this option is deprecated. The –-list-file option removes all other entries except the file names.
--home-inode-file PolicyListFile
The specified file contains the list of files from home that need to be pre-populated in the cache and this file is generated by using policy. This file must not be edited manually. This option is deprecated. The –list-file option removes all other entries except the file names.
--home-fs-path HomeFileSystemPath
Specifies the full path to the fileset at the home cluster and can be used along with -list-file. You must use this option, when in the NSD protocol the mount point on the gateway nodes of the afmTarget filesets does not match the mount point on the Home cluster. For example, the home file system is mounted on the home cluster at /gpfs/homefs1. The home file system is mounted on the cache by using NSD protocol at /gpfs/remotefs1.
For example,
# mmafmctl gpfs1 prefetch -j cache1 –list-file /tmp/list.allfiles --home-fs-path /gpfs/remotefs1
Start of change--readdir-onlyEnd of change
Start of changeEnables readdir operation on a dirty directory at the cache one last time and brings latest directory entries.

This option helps prefetching modified directory entries from the home, although the directory at the cache fileset was modified by the applications and AFM marked the dirty flag on the cache directory. Start of changeThis option overrides the dirty flag that is set when the data is modified at the local LU cache. In the LU mode, the dirty flag does not allow the readdir operation at the home and refreshes the directory file entries from the home.End of change

This option helps in the migration process where new files were created at the home after the application was moved to the cache. The application already modified the directory and refresh intervals were disabled. AFM queues readdir one last time on the cache directory and brings entries of the created files to the cache.

The afmReadDirOnce parameter must be set on an AFM fileset, and directory and files refresh intervals must be disabled.

For example,
  1. To set afmRefreshOnce on an AFM fileset, issue the following command:
    # mmchfileset fs fileset -p afmRefreshOnce=yes
  2. To check whether the afmRefreshOnce parameter value is set on an AFM fileset, issue the following command:
    # mmlsfileset fs fileset -L --afm
    A sample output is as follows:
    Filesets in file system '<fs>':
    
    Attributes for fileset <fileset>:
    ==========================================
    Status                                   Linked
    Path                                     GPFS_PATH/fileset
    Id                                       37
    Root inode                               3145731
    Parent Id                               0
    Created                                 Wed Mar  4 12:23:47 2020
    Comment
    Inode space                             6
    Maximum number of inodes         100352
    Allocated inodes                         100352
    Permission change flag                   chmodAndSetacl
    afm-associated                           Yes
    Target                                   nfs://home/fileset
    Mode                                     local-updates
    File Lookup Refresh Interval         30 (default)
    File Open Refresh Interval             30 (default)
    Dir Lookup Refresh Interval           60 (default)
    Dir Open Refresh Interval               60 (default)
    Expiration Timeout                       disable (default)
    Last pSnapId                             0
    Display Home Snapshots             yes (default)
    Number of Gateway Flush Threads         4
    Prefetch Threshold                       0 (default)
    Eviction Enabled                         yes (default)
    IO Flags                                 9216 (refreshOnce
  3. To run the prefetch operation for the readdir operation one last time, issue following command:
    # mmafmctl fs prefetch -j fileset --directory /fileset_path/directory --readdir-only
End of change
Start of change--forceEnd of change
Start of changeEnables forcefully fetching data from the home during the migration process. This option overrides any set restrictions and helps to fetch the data forcefully to the cache. This option must be used only to forcefully fetch the data that was created after the migration process completion.
For example,
# mmafmctl fs prefetch -j fileset --list-file listfile_path --force
End of change
Start of change--gateway NodeEnd of change
Start of changeAllows selecting the gateway node that can be used to run the prefetch operation on a fileset, which is idle or less-utilized. This option helps to distribute the prefetch work on different gateway nodes and overrides the default gateway node, which is assigned to the fileset. It also helps to run different prefetch operations on different gateway nodes, which might belong to the same fileset or a different fileset.
For example,
# mmafmctl fs prefetch -j fileset --list-file listfile_path --gateway Node2
End of change
Start of change--prefetch-threads nThreadsEnd of change
Start of changeSpecifies the number of threads to be used for the prefetch operation. Valid values are 1 - 255. Default value is 4.
For example,
# mmafmctl fs prefetch -j fileset --list-file listfile_path --prefetch-threads 6
End of change

Prefetch is an asynchronous process and the fileset can be used while prefetch is in progress. Prefetch completion can be monitored by using the afmPrepopEnd callback event or looking at mmafmctl Device prefetch command with no options.

Prefetch pulls the complete file contents from home (unless the ––metadata-only flag is used), so the file is designated as cached when it is prefetched. Prefetch of partially cached files caches the complete file.

Prefetch can be run in parallel on multiple filesets, although only one prefetch job can run on a fileset.

While a file is getting prefetched, it is not evicted.

If parallel data transfer is configured, all gateways participate in the prefetch process.

If the file system unmounts during prefetch on the gateway, issue the prefetch again.

Prefetch can be triggered on inactive filesets.

Directories are also prefetched to the cache if specified in the prefetch file. If you specify a directory in the prefetch file and if that directory is empty, the empty directory is prefetched to cache. If the directory contains files or subdirectories, you must specify the names of the files or subdirectories that you want to prefetch. If you do not specify names of individual files or subdirectories inside a directory, that directory is prefetched without its contents.

If you run the prefetch command with data or metadata options, statistics like queued files, total files, failed files, total data (in bytes) is displayed.

For example,
# mmafmctl FileSystem prefetch -j fileset --enable-failed-file-list --list-file /tmp/file-list
A sample output is as follows:

mmafmctl: Performing prefetching of fileset: <fileset>
Queued (Total) Failed TotalData (approx in Bytes)
0      (56324) 0      0
5      (56324) 2      1353559
56322  (56324) 2      14119335
Prefetch Recovery:
Note: This feature is disabled from IBM Spectrum Scale 5.0.2. If your cluster is running on an earlier version, prefetch recovery is possible.
If the primary gateway of a cache is changed while prefetch is running, prefetch is stopped. The next access to the fileset automatically retriggers the interrupted prefetch on the new primary gateway. The list file used when prefetch was initiated must exist in a path that is accessible to all gateway nodes. Prefetch recovery on a single-writer fileset is triggered by a read on some file in the fileset. Prefetch recovery on a read-only, independent-writer, and local-update fileset is triggered by a lookup or readdir on the fileset. Prefetch recovery occurs on the new primary gateway and continues where it left off. It looks at which files did not complete prefetch and it rebuilds the prefetch queue. Examples of messages in the mmfs.log are as follows:
Wed Oct  1 13:59:22.780 2014: [I] AFM: Prefetch recovery started for the file system gpfs1 fileset iw1.
mmafmctl: Performing prefetching of fileset: iw1 
Wed Oct  1 13:59:23 EDT 2014: mmafmctl: [I] Performing prefetching of fileset: iw1
Wed Oct  1 14:00:59.986 2014: [I] AFM: Starting 'queue' operation for fileset 'iw1' in filesystem '/dev/gpfs1'.
Wed Oct  1 14:00:59.987 2014: [I] Command: tspcache /dev/gpfs1 1 iw1 0 257 42949 67295 0 0 1393371
Wed Oct  1 14:01:17.912 2014: [I] Command: successful tspcache /dev/gpfs1 1 iw1 0 257 4294967295 0 0 1393371
Wed Oct  1 14:01:17.946 2014: [I] AFM: Prefetch recovery completed for the filesystem gpfs1 fileset iw1. error 0
  1. Metadata population by using prefetch:
    # mmafmctl fs1 getstate -j ro
    A sample output is as follows:
    Fileset Name Fileset Target                     Cache State  Gateway Node  Queue Length  Queue numExec
    ------------ --------------                     -----------  ------------  ------------  -------------
    ro          nfs://c26c3apv1/gpfs/homefs1/dir3   Active        c26c4apv1     0            7  
    List Policy:
    RULE EXTERNAL LIST 'List' RULE 'List' LIST 'List' WHERE PATH_NAME LIKE'%' 
    Run the policy at home:mmapplypolicy /gpfs/homefs1/dir3 -P px -f px.res -L 1 -N mount -I defer 
    Policy creates a file which should be manually edited to retain only the file names.  Thereafter 
    this file is used at the cache to populate metadata. 
    # mmafmctl fs1 prefetch -j ro --metadata-only --list-file=px.res.list.List
    A sample output is as follows:
    
    mmafmctl: Performing prefetching of fileset: ro
         Queued    (Total)      Failed               TotalData
                                                     (approx in Bytes) 
              0    (2)          0                    0
            100    (116)        5                    1368093971
            116    (116)        5                    1368093971
    
    prefetch successfully queued at the gateway
     
    
    Prefetch end can be monitored by using this event:
    Thu May 21 06:49:34.748 2015: [I] Calling User Exit Script prepop: event afmPrepopEnd,
    Async command prepop.sh.  
    
    The statistics of the last prefetch command can be viewed by running the following command: 
    mmafmctl fs1 prefetch -j ro
    Fileset  Async Read   Async Read    Async Read            Async Read    Async Read 
    Name     (Pending)    (Failed)      (Already Cached)      (Total)       (Data in Bytes)
    -------  ----------   ----------    ------------------    -----------   ----------------
    ro       0            1             0                     7             0
  2. Prefetch of data by giving list of files from home:
    # cat /listfile1
    A sample output is as follows:
    
    /gpfs/homefs1/dir3/file1
    /gpfs/homefs1/dir3/dir1/file1
    # mmafmctl fs1 prefetch -j ro --list-file=/listfile1
    A sample output is as follows:
    mmafmctl: Performing prefetching of fileset: ro
    
    Queued (Total) Failed TotalData
                          (approx in Bytes)
    0      (2)     0      0
    2      (2)     0      1368093971 
    
            
    # mmafmctl fs1 prefetch -j ro
    A sample output is as follows:
    Fileset  Async Read   Async Read    Async Read            Async Read    Async Read 
    Name     (Pending)    (Failed)      (Already Cached)      (Total)       (Data in Bytes)
    -------  ----------   ----------    ------------------    -----------   ----------------
    ro       0            0             0                     2             122880
  3. Prefetch of data by using a list file, which is generated by using policy at home:

    Inode file is created by using the policy at home, and must be used without editing manually.

    List Policy:
    RULE EXTERNAL LIST 'List' RULE 'List' LIST 'List' WHERE PATH_NAME LIKE '%'
             
    

    For files with special characters, path names must be encoded with ESCAPE %.

    RULE EXTERNAL LIST 'List' ESCAPE '%' RULE 'List' LIST 'List' WHERE PATH_NAME LIKE '%'

    Run the policy at home:

    # mmapplypolicy /gpfs/homefs1/dir3 -P px -f px.res -L 1 -N mount -I defer
    # cat /lfile2
    A sample output is as follows:
    113289 65538 0 -- /gpfs/homefs1/dir3/file2
    113292 65538 0 -- /gpfs/homefs1/dir3/dir1/file2 
    
    #mmafmctl fs1 prefetch -j ro –list-file=/lfile2
    A sample output is as follows:
    mmafmctl: Performing prefetching of fileset: ro
    # mmafmctl fs1 prefetch -j ro –list-file=/lfile2
    
    mmafmctl: Performing prefetching of fileset: ro
    
    Queued (Total) Failed TotalData
                          (approx in Bytes)
    0      (2)     0      0
    2      (2)     0      1368093971 
  4. Prefetch by using --home-fs-path option for a target with the NSD protocol:
    # mmafmctl fs1 getstate -j ro2
    A sample output is as follows:
    Fileset Name Fileset Target               Cache State  Gateway Node  Queue Length  Queue numExec
    ------------ --------------               -----------  ------------  ------------  -------------
    ro2          gpfs:///gpfs/remotefs1/dir3  Active        c26c4apv1    0             7  
    
    # cat /lfile2
    A sample output is as follows:
    113289 65538 0 -- /gpfs/homefs1/dir3/file2
    113292 65538 0 -- /gpfs/homefs1/dir3/dir1/file2
    
    # mmafmctl fs1 prefetch -j ro2 –list-file=/lfile2 --home-fs-path=/gpfs/homefs1/dir3
    A sample output is as follows:
    mmafmctl: Performing prefetching of fileset: ro2
    
    Queued  (Total)  Failed  TotalData
                             (approx in Bytes)
    0       (2)      0       0
    2       (2)      0       113292 
    # mmafmctl fs1 prefetch -j ro2
    A sample output is as follows:
    Fileset  Async Read   Async Read    Async Read            Async Read    Async Read 
    Name     (Pending)    (Failed)      (Already Cached)      (Total)       (Data in Bytes)
    -------  ----------   ----------    ------------------    -----------   ----------------
    ro2      0            0             0                     2             122880