Prefetch

Prefetch fetches the file metadata (inode information) and data from home before an application requests the contents.

Prefetch is a feature that allows fetching the contents of a file into the cache before actual reads.

Prefetching files before an application starts can reduce the network delay when an application requests a file. Prefetch can be used to pro-actively manage WAN traffic patterns by moving files over the WAN during a period of low WAN usage.

Prefetch can be used in the following ways:
  • Populate metadata
  • Populate data
  • View prefetch statistics
Use the following command to perform these activities.
mmafmctl Device prefetch -j FilesetName [-s LocalWorkDirectory]
             Start of change[--retry-failed-file-list|--enable-failed-file-list]End of change 
             Start of change[--directory LocalDirectoryPath]|End of change
             {--list-file ListFile | --home-list-file HomeListFile} [--policy] |--home-inode-file PolicyListFile ] 
             [--home-fs-path HomeFileSystemPath][--metadata-only]
                          
For more information on the command, see mmafmctl command. If no options are given for prefetch, the statistics of the last prefetch command run on the fileset are displayed.

--metadata-only - Prefetches only the metadata and not the actual data. This is useful in migration scenarios. This option requires the list of files whose metadata is to be populated. It has to be combined with a list file option.

--list-file ListFile - The specified file is a file containing a list of files that need to be pre-populated, one file per line. All files must have fully qualified path names. If the list of files to be prefetched have filenames with special characters then a policy should be used to generate the listfile. This should be hand-edited to remove all other entries except the filenames. The list of files can be:
  1. Files with fully qualified names from cache.
  2. Files with fully qualified names from home
  3. Fist of files from home generated using policy. The file must not be edited.

Start of change--enable-failed-file-list - Turns on generating a list of files which failed during prefetch operation at the gateway node. The list of files is saved as .afm/.prefetchedfailed.list under the fileset. Failures that occur during processing are not logged in .afm/.prefetchedfailed.list. If you observe any errors during processing (before queuing), you might need to correct the errors and re-run prefetch.End of change

Start of change--policy - Specifies that the list-file or home-list-file is generated using a GPFS™ Policy by which sequences like '\' or '\n' are escaped as '\\' and '\\n'. If this option is specified, input file list is treated as already escaped. The sequences are unescaped first before queuing for prefetch operation.
Note: This option can be used only if you are specifying list-file or home-list-file.
End of change

Start of change--directory LocalDirectoryPath - Specifies path to the local directory from which you want to prefetch files. A list of all files in this directory and all its sub-directories is generated, and queued for prefetch. End of change

Start of change--retry-failed-file-list - Allows re-trying prefetch of files that failed in the last prefetch operation. The list of files to re-try is obtained from .afm/.prefetchedfailed.list under the fileset.
Note: To use this option, you must enable generating a list of failed files. Add --enable-failed-file-list to the command first.
End of change

--home-list-file HomeListFile - The specified file is a file containing a list of files from home that need to be pre-populated, one file per line. All files must have fully qualified path names. If the list of files to be prefetched have filenames with special characters then a policy should be used to generate the listfile. A policy generated file should be hand-edited to remove all other entries except the filenames. As of version 4.2.1, this option is deprecated. The –list-file option can handle this.

--home-inode-file PolicyListFile - The specified file is a file containing the list of files from home that need to be pre-populated in the cache and this file is generated using policy. This should not be hand-edited. This option is deprecated. The –list-file option can handle this.

--home-fs-path HomeFileSystemPath - Specifies the full path to the fileset at the home cluster and can be used in conjunction with –list-file. You must use this option, when in the NSD protocol the mount point on the gateway nodes of the afmTarget filesets does not match the mount point on the Home cluster. For example, the home filesystem is mounted on the home cluster at /gpfs/homefs1. The home filesystem is mounted on the cache using NSD protocol at /gpfs/remotefs1.

For example, mmafmctl gpfs1 prefetch -j cache1 –list-file /tmp/list.allfiles --home-fs-path /gpfs/remotefs1.

Prefetch is an asynchronous process and the fileset can be used while prefetch is in progress. Prefetch completion can be monitored by using the afmPrepopEnd callback event or looking at mmafmctl Device prefetch command with no options.

Prefetch pulls the complete file contents from home (unless the –metadata-only flag is used), so the file is designated as cached when it is completely prefetched. Prefetch of partially cached files caches the complete file.

Prefetch can be run in parallel on multiple filesets, although only one prefetch job can run on a fileset.

If a file is in the process of getting prefetched, it is not evicted.

If parallel data transfer is configured, all gateways participate in the prefetch process.

If the filesystem unmounts during prefetch on the gateway, prefetch needs to be issued again.

Prefetch can be triggered on inactive filesets.

Directories are also prefetched to the cache if specified in the prefetch file. If you specify a directory in the prefetch file and if that directory is empty, the empty directory is prefetched to cache. If the directory contains files or sub-directories, you must specify the names of the files or sub-directories which you want to prefetch. If you do not specify names of individual files or sub-directories inside a directory, that directory is prefetched without its contents.

Start of changeIf you run the prefetch command with data or metadata options, statistics like queued files, total files, failed files, total data (in Bytes) is displayed as in the following example of command and system output - End of change

Start of change#mmafmctl <FileSystem> prefetch -j <fileset> --enable-failed-file-list --list-file /tmp/file-list
mmafmctl: Performing prefetching of fileset: <fileset>
Queued (Total) Failed TotalData (approx in Bytes)
0      (56324) 0      0
5      (56324) 2      1353559
56322  (56324) 2      14119335
End of change
Prefetch Recovery:
Note: This feature is disabled from IBM Spectrum Scale™ 5.0.2. If your cluster is running on an earlier version, prefetch recovery is possible.
If the primary gateway of a cache is changed while prefetch is running, prefetch is stopped. The next access to the fileset automatically re-triggers the interrupted prefetch on the new primary gateway. The list file used when prefetch was initiated must exist in a path that is accessible to all gateway nodes. Prefetch recovery on a single-writer fileset is triggered by a read on some file in the fileset. Prefetch recovery on a read-only, independent-writer and local-update fileset is triggered by a lookup or readdir on the fileset. Prefetch recovery occurs on the new primary gateway and continues where it left off. It looks at which files did not complete prefetch and it rebuilds the prefetch queue. Examples of messages in the mmfs.log are as below:
Wed Oct  1 13:59:22.780 2014: [I] AFM: Prefetch recovery started for the file system gpfs1 fileset iw1.
mmafmctl: Performing prefetching of fileset: iw1 
Wed Oct  1 13:59:23 EDT 2014: mmafmctl: [I] Performing prefetching of fileset: iw1
Wed Oct  1 14:00:59.986 2014: [I] AFM: Starting 'queue' operation for fileset 'iw1' in filesystem '/dev/gpfs1'.
Wed Oct  1 14:00:59.987 2014: [I] Command: tspcache /dev/gpfs1 1 iw1 0 257 42949 67295 0 0 1393371
Wed Oct  1 14:01:17.912 2014: [I] Command: successful tspcache /dev/gpfs1 1 iw1 0 257 4294967295 0 0 1393371
Wed Oct  1 14:01:17.946 2014: [I] AFM: Prefetch recovery completed for the filesystem gpfs1 fileset iw1. error 0
  1. Metadata population using prefetch:

    # mmafmctl fs1 getstate -j ro

    Fileset Name Fileset Target           Cache State Gateway Node Queue Length Queue  numExec
    ------------ --------------                    ----------- ------------ ------------    -------------
    ro     nfs://c26c3apv1/gpfs/homefs1/dir3 Active  c26c2apv2    0            7  
    List Policy:
    RULE EXTERNAL LIST 'List' RULE 'List' LIST 'List' WHERE PATH_NAME LIKE'%' 
    Run the policy at home:mmapplypolicy /gpfs/homefs1/dir3 -P px -f px.res -L 1 -N mount -I defer 
    Policy creates a file which should be manually edited to retain only the file names.  Thereafter 
    this file is used at the cache to populate metadata. 
    # mmafmctl fs1 prefetch -j ro --metadata-only --list-file=px.res.list.ListStart of change
    mmafmctl: Performing prefetching of fileset: ro
         Queued    (Total)      Failed               TotalData
                                                     (approx in Bytes) 
              0    (2)          0                    0
            100    (116)        5                    1368093971
            116    (116)        5                    1368093971
    
    prefetch successfully queued at the gateway
     
    
    Prefetch end can be monitored by using this event:
    Thu May 21 06:49:34.748 2015: [I] Calling User Exit Script prepop: event afmPrepopEnd,
    Async command prepop.sh.  
    
    The statistics of the last prefetch command can be viewed by running the following command: 
    mmafmctl fs1 prefetch -j ro
    Fileset Name Async Read (Pending) Async Read (Failed) Async Read (Already Cached) 
    Async Read (Total) Async Read (Data in Bytes)
    ------------ -------------------- ------------------  ---------------------------  ---------------------
           ro           0                    1                   0                       7                 0
    End of change
  2. Prefetch of data by giving list of files from home: # cat /listfile1
    /gpfs/homefs1/dir3/file1
    /gpfs/homefs1/dir3/dir1/file1
    # mmafmctl fs1 prefetch -j ro --list-file=/listfile1Start of change
    mmafmctl: Performing prefetching of fileset: ro
    
    Queued (Total) Failed TotalData
                          (approx in Bytes)
    0      (2)     0      0
    2      (2)     0      1368093971 
    
            
    End of change # mmafmctl fs1 prefetch -j ro
    Fileset Name Async Read (Pending) Async Read (Failed) Async Read (Already Cached) Async
    Read (Total) Async Read (Data in Bytes)
    ------------ -------------------- ------------------   --------------------------- ------------------
          
    ro           0                    0                   0                           2            122880
  3. Prefetch of data using list file that is generated using policy at home:

    Inode file is created using the above policy at home, and must be used as such without hand-editing.

    List Policy:
    RULE EXTERNAL LIST 'List' RULE 'List' LIST 'List' WHERE PATH_NAME LIKE '%'
             

    For files with special characters, path names must be encoded with ESCAPE %.

    RULE EXTERNAL LIST 'List' ESCAPE '%' RULE 'List' LIST 'List' WHERE PATH_NAME LIKE '%'

    Run the policy at home:

    # mmapplypolicy /gpfs/homefs1/dir3 -P px -f px.res -L 1 -N mount -I defer

    # cat /lfile2

    113289 65538 0 -- /gpfs/homefs1/dir3/file2
    113292 65538 0 -- /gpfs/homefs1/dir3/dir1/file2 
    #mmafmctl fs1 prefetch -j ro –list-file=/lfile2Start of change
    mmafmctl: Performing prefetching of fileset: ro
    # mmafmctl fs1 prefetch -j ro –list-file=/lfile2
    
    mmafmctl: Performing prefetching of fileset: ro
    
    Queued (Total) Failed TotalData
                          (approx in Bytes)
    0      (2)     0      0
    2      (2)     0      1368093971 
    End of change
  4. Prefetch using --home-fs-path option for a target with NSD protocol:

    # mmafmctl fs1 getstate -j ro2

    Fileset Name Fileset Target Cache State Gateway Node Queue Length Queue numExec
    ------------ --------------     ----------- ------------ ------------ -------------
    ro2          gpfs:///gpfs/remotefs1/dir3       Active      c26c4apv1    0            7  
    
    # cat /lfile2
    113289 65538 0 -- /gpfs/homefs1/dir3/file2
    113292 65538 0 -- /gpfs/homefs1/dir3/dir1/file2
    # mmafmctl fs1 prefetch -j ro2 –list-file=/lfile2 --home-fs-path=/gpfs/homefs1/dir3Start of change
    mmafmctl: Performing prefetching of fileset: ro2
    
    Queued  (Total)  Failed  TotalData
                             (approx in Bytes)
    0       (2)      0       0
    2       (2)      0       113292 
    End of change # mmafmctl fs1 prefetch -j ro2
     Fileset Name Async Read (Pending) Async Read (Failed) Async Read (Already Cached) Async
     Read (Total) Async Read (Data in Bytes)
     ------------ -------------------- ------------------    ------------------ ----------------
     ro2          0                    0                   0                2              122880