Monitoring fileset states for AFM DR

AFM DR fileset can have different states depending on the mode and queue states.

Run the mmafmctl getstate command to view the current cache state.

See the following table:
Table 1. AFM DR states and their description
AFM fileset state Condition Description Healthy or Unhealthy Administrator's action
Inactive AFM primary is created Operations have not been initiated on the primary after last daemon restart. Healthy None
FlushOnly Operations are queued Operations have not started to flush. This is a temporary state and moves to Active when a write is initiated. Healthy  
Active AFM primary is active Primary is ready for operation Healthy None
Dirty AFM primary is active Indicates there are pending changes in primary not yet played at secondary. Does not hamper normal activity. Healthy None
Recovery The primary is accessed after MDS failure Can occur when a new gateway is taking over a fileset as MDS after the old MDS failed. Healthy None
QueueOnly The primary is running some operation Can occur when operations such as recovery are being executed and operations are being queued and are not yet flushed. Healthy This is a temporary state.
Disconnected It occurs when the MDS cannot connect to the NFS server at secondary Occurs only in a cache cluster that is created over NFS export. When parallel I/O is configured, this state shows the connectivity between the MDS and the mapped home server, irrespective of other gateway nodes. Unhealthy Correct the errant NFS servers on the secondary cluster.
Unmounted Primary using NFS detects a change in secondary - sometimes during creation or in the middle of operation if secondary exports are interfered This can occur if:
  • Secondary NFS is not accessible
  • Secondary exports are not exported properly
  • Secondary export does not exist
Unhealthy
  1. Rectify the NFS export issue as in secondary setup section and retry access
  2. Relink primary if it does not recover.
After mountRetryInterval of the MDS, the primary retries connecting with secondary
Unmounted The primary that is using the GPFS™ protocol detects a change in the secondary cluster, sometimes during creation or in the middle of an operation Occurs when there are problems accessing the local mount of the remote file system. Unhealthy Check remote filesystem mount on the primary cluster and remount if necessary.
Dropped Recovery failed. Occurs when the local file system is full, space is not available on the primary, or a policy failure during recovery. Unhealthy Fix the issue and access the fileset to retry recovery.
Dropped A primary with active queue operations is forcibly unlinked All queued operations are being de-queued, and the fileset remains in the Dropped state and moves to the Inactive state when the unlinking is complete. Healthy This is a temporary state.
Dropped Old GW node starts functioning properly after a failure AFM internally performs queue transfers from one gateway to another to handle gateway node failures. Healthy The system resolves this state on the next access.
Dropped Primary creation or in the middle of an operation if the home exports changed.

Export problems at secondary such as:

  • The home path is not exported on all NFS server nodes that are interacting with the cache clusters. Even if the home cluster is exported after the operations have started on the fileset, problems might persist.
  • Changing fsid on the home cluster after the fileset operations have begun.
  • All home nfs servers do not have the same fsid for the same export path.
Unhealthy
  1. Fix the NFS export issue in the secondary setup section and retry for access.
  2. Relink the primary if the cache cluster does not recover.
After mountRetryInterval the MDS retries connecting with the secondary.
Dropped During recovery or normal operation If gateway queue memory is exceeded, the queue can get dropped. The memory has to be increased to accommodate all requests and bring the queue back to the Active state. Unhealthy Increase afmHardMemThreshold.
NeedsResync Recovery on primary This is a rare state and is possible only under error conditions during recovery. Unhealthy The problem gets fixed automatically in the subsequent recovery.
NeedsResync Failback on primary or conversion from GPFS/SW to primary This is a rare state and is possible only under error conditions during failback or conversion. Unhealthy Rerun failback or conversion.
PrimInitProg Setting up primary and secondary relationship during -
  • creation of a primary fileset.
  • conversion of gpfs, sw, or iw fileset to primary fileset.
  • change secondary of a primary fileset.
This state is used while primary and secondary are in the process of establishing a relationship while the psnap0 is in progress. All operations are disallowed till psnap0 is taken locally. This should move to active when psnap0 is queued and played on the secondary side. Healthy Review errors on psnap0 failure if fileset state is not active.
PrimInitFail Failed to set up primary and secondary relationship during -
  • creation of a primary fileset.
  • conversion of gpfs, sw, or iw fileset to primary fileset.
  • change secondary of a primary fileset.
This is a rare failure state when the psnap0 has not been created at the primary. In this state no data is moved from the primary to the secondary. The administrator should check that the gateway nodes are up and file system is mounted on them on the primary. The secondary fileset should also be setup correctly and available for use. Unhealthy
  • Review errors after psnap0 failure.
  • Re-running the mmafmctl convertToPrimary command without any parameters ends this state.
FailbackInProgress Primary failback started This is the state when failback is initiated on the primary. Healthy None