Changing home of AFM cache
AFM filesets continue to function in the event of home failures.
- Replace the home with a completely new empty home where the new target is created using NFS/NSD
mapping:
The administrator must run the mmafmctl failover command without –target-only option to point to the new home. The new home is expected to be empty. To ensure that extended attributes are synchronized, run the mmafmconfig command on the new home before running the failover command. If the new target is a mapping, failover does not split data transfers and queues them as normal requests without parallel data transfer.
- Replace any of the following on an existing home:
- Replace the communication protocol (NSD, NFS): The administrator must run the mmafmctl
failover command without the –target-only option, using the new
target protocol. Note: Only the protocol changes. The home path does not change.
- Enable or disable parallel data transfer by shifting between NFS and a mapping: The
administrator must run the mmafmctl failover command without the
–target-only option, using the new target protocol or mapping. Note: Only the protocol changes. The home path does not change.
- Replace either the IP address or the NFS server using the same communication protocol and home
path: The administrator must run the mmafmctl failover command with the
–target-only option. The IP/NFS server must be on the same home cluster and
must be of the same architecture as the old NFS server. Note: Only the IP or NFS server changes.
- Replace the communication protocol (NSD, NFS): The administrator must run the mmafmctl
failover command without the –target-only option, using the new
target protocol.
When creating a new home, all cached data and metadata available in the cache are queued in the priority queue to the new home during the failover process. The failover process is not synchronous and completes in the background. The afmManualResyncComplete callback event is triggered when failover is complete. Resync does not split data transfers even if parallel data transfer is configured, and the target is a mapping.
If the failover is interrupted due to a gateway node failure or quorum loss, failover is restarted automatically when the cache fileset attempts to go to Active state.
When there are multiple IW caches, the administrator must choose a primary IW cache and fail this over to a new empty home. All of the other IW cache filesets to the old home must be deleted and recreated. If another IW cache is failed over to the same home after failing over another IW cache, all the data in that cache overwrites existing objects at home. For this reason, failover to a non-empty home is discouraged.
When the failover function is likely to be used due to the failure of the old home, the admin must be cautious and disable automatic eviction. This is to ensure that eviction does not free the cached data on the cache. The evicted data cannot be recovered if the old home is lost.
Each AFM fileset is independently managed and has a one-to-one relationship with a target, thus allowing different protocol backends to coexist on separate filesets in the same file system. However, AFM does not validate the target for correctness when a fileset is created. The user must specify a valid target. Do not use a target that belongs to the same file system as the AFM fileset. For more information, see mmafmctl command.
The following example shows changing the target for an SW fileset. Consider a SW fileset fileset_SW of file system fs1 that uses home target nfs://node4/gpfs/fshome/fset001. A failover is performed to a new target nfs://node4/gpfs/fshome/fset002new.
The system displays output similar to -
Filesets in file system 'fs1':
Name Status Path afmTarget
fileset_SW Linked /gpfs/cache/fileset_SW nfs://node4/gpfs/fshome/fset001
#mmafmctl fs1 getstate -j fileset_SW
The system displays output similar to -
Fileset Name Fileset Target Cache State Gateway Node Queue Length Queue numExec
------------ -------------- ------------- ------------ -------------------------
fileset_SW nfs://node4/gpfs/fshome/fset001 Active GatewayNode1 0 6
# mmafmctl fs1 failover -j fileset_SW --new-target nfs://node4/gpfs/fshome/fset002new
The system displays output similar to -
mmafmctl:Performing failover to nfs://node4/gpfs/fshome/fset002new
Fileset fileset_SW changed.
mmafmctl: Failover in progress. This may take while...
Check fileset state or register for callback to know the completion status.
# mmafmctl fs1 getstate -j fileset_SW
The system displays output similar to -
Fileset Name Fileset Target Cache State Gateway Node Queue Length Queue numExec
------------ -------------- ------------- ------------ -------------------------
fileset_SW nfs://node4/gpfs/fshome/fset002new NeedsResync GatewayNode1 6 0
# mmafmctl fs1 getstate -j fileset_SW
The system displays output similar to -
Fileset Name Fileset Target Cache State Gateway Node Queue Length Queue numExec
------------ -------------- ------------- ------------ -------------------------
fileset_SW nfs://node4/gpfs/fshome/fset002new Recovery GatewayNode1 6 0
# mmafmctl fs1 getstate -j fileset_SW
The system displays output similar to -
Fileset Name Fileset Target Cache State Gateway Node Queue Length Queue numExec
------------ -------------- ------------- ------------ -------------------------
fileset_SW nfs://node4/gpfs/fshome/fset002new Active GatewayNode1 0 6 0
# mmlsfileset fs1 --afm
The system displays output similar to -
Filesets in file system 'fs1':
Name Status Path afmTarget
root Linked /gpfs/cache --
fileset_SW Linked /gpfs/cache/fileset_SW nfs://node4/gpfs/fshome/fset002new
Out-band Failover - You can choose to copy all cached data offline from the AFM cache to the new home with any tool that preserves modification time (mtime) with nanoseconds granularity. An example of such a tool is - rsync version 3.1.0 or later with protocol version 31. After the data is copied, you can run mmafmctl failover to compare mtime and filesize at home, and avoid queuing unnecessary data to home.