Active file management architecture

Active file management (AFM) uses a home-and-cache model in which a single home provides the primary storage of data, and exported data is cached in a local GPFS™ file system.

Home
Home can be an NFS export from a remote cluster. The export point can be a local file system in the remote cluster, a GPFS file system or a GPFS fileset in the remote cluster. AFM uses a proprietary protocol over NFS.

In addition, AFM is supported when a remote file system is mounted on the cache cluster using GPFS protocol. Native GPFS protocol utilizes remote file system mount based over a multicluster configuration to function as the AFM target. This requires that a multicluster setup exist between the home and cache before AFM can use the home cluster’s file system mount on the remote cluster for AFM operations.

Architecturally, AFM works with any file system at the home cluster; however, ACLs, extended attributes, and sparse files are only supported when the home file system is GPFS, irrespective of whether NFS or GPFS target is used. The mmafmconfig command should be run on the home cluster to enable this support.

Cache
The container used to cache home data is a GPFS fileset. Each AFM-enabled fileset has a single home cluster associated with it (represented by the hostname of the home server).

Each cache fileset in a cluster is served by one of the nodes designated as gateway in the cluster. The gateway node mapped to serve a fileset is called the metadata server (MDS) of the fileset. The MDS acts as the owner for the fileset. All other nodes in the cluster, including other gateways, become application nodes for the fileset. A fileset can have multiple application nodes (which service application data requests). All other gateway nodes can be configured to help the MDS in fetching data to and from the home cluster. See Parallel I/O.

The split between application and gateway nodes is conceptual, and any node in the cache cluster can function as both a gateway node and an application node based on its configuration. The gateway nodes can be viewed as the edge of the cache cluster that can communicate with the home cluster, while the application nodes interface with the application. Gateway and application nodes communicate with each other via internal RPC requests.

Any cluster can be a home cluster, a cache cluster, or both. In typical usage, a home would be in one GPFS cluster and a cache would be defined in another, but this is not required. In fact, a cluster can be a home for one fileset and can be a cache for another fileset; and multiple AFM-enabled filesets may be defined in one cache, each caching from a different home cluster. This provides great flexibility in how you can leverage the caching behavior.

A cache can request data by reading a file or by pre-fetching the data. Any time a file is read, if the file data is not yet in the cache or is not up to date, the data is copied from the home into the cache.

Multiple cache filesets can read data from a single home. In single-writer mode, only one cache can write data to a single home. In independent-writer (IW) mode, multiple caches can write to a single home, as long as each cache writes to different files. In case multiple caches write to same file, the sequence of updates is nondeterministic.

AFM filesets can cache extended attributes and ACLs. To enable this functionality, the home needs to issue the mmafmconfig command.

Notes:
  1. AFM uses certain internal directories, such as the following, which must not be altered or removed:
  2. User IDs and group IDs must be managed the same way across cache and home.
  3. For AFM relationships, which are using native GPFS protocol, where user ids are different on the home and the cache, the ids may be remapped using GPFS UID remapping.