Partial file caching

With partial file caching, the cache can fetch only the blocks that are read and not the entire file, thereby using network and local disk space more efficiently. This caching is useful when an application does not need to read the whole file. Partial file caching is enabled on an IBM Storage Scale block boundary.

Partial file caching is controlled by the afmPrefetchThreshold parameter that can be updated by using the mmchfileset command. The default value of this parameter is 0. Complete file caching and all blocks of a file are fetched after any three blocks are read by the cache and the file is marked as cached because of this value. This value is useful for sequentially accessed files that are read in their entirety, such as image files, home directories, and development environments.

The valid afmPrefetchThreshold values are in the range 1 – 100. This parameter value specifies the file size percentage that must be cached before the rest of the data blocks are automatically fetched into the cache. A large value is suitable for a file that is accessed partially.

An afmPrefetchThreshold value of 100 disables full file prefetching. This value caches only the data blocks that are read by the application. This value is useful for large random-access files that are either too large to fit in the cache or are never expected to be read in their entirety. When all data blocks are available in the cache, the file is marked as cached.

For sparse files, the percentage for prefetching is calculated as the ratio of the size of data blocks that is allocated in the cache and the total size of data blocks on the home. Holes in the home file are not considered in the calculation.

Writes on partially cached files

If a write is queued on a file that is partially cached, then the complete file is cached first. Only then the write is queued on the file. Appending to a partially cached file does not cache the whole file. In the LU mode alone, the write inset or append on a file that is cached partially caches the whole file even if the prefetch threshold is set on the fileset.
Note: As partial file caching is not compatible with earlier versions, all nodes must be on GPFS 3.5.0.11 or later.