Monitoring fileset states for AFM DR
AFM DR fileset can have different states depending on the mode and queue states.
Run the mmafmctl getstate command to view the current cache state.
See the following table:
AFM fileset state | Condition | Description | Healthy or Unhealthy | Administrator's action |
---|---|---|---|---|
Inactive | AFM primary is created | Operations have not been initiated on the primary after last daemon restart. | Healthy | None |
FlushOnly | Operations are queued | Operations have not started to flush. This is a temporary state and moves to Active when a write is initiated. | Healthy | |
Active | AFM primary is active | Primary is ready for operation | Healthy | None |
Dirty | AFM primary is active | Indicates there are pending changes in primary not yet played at secondary. Does not hamper normal activity. | Healthy | None |
Recovery | The primary is accessed after MDS failure | Can occur when a new gateway is taking over a fileset as MDS after the old MDS failed. | Healthy | None |
QueueOnly | The primary is running some operation | Can occur when operations such as recovery are being executed and operations are being queued and are not yet flushed. | Healthy | This is a temporary state. |
Disconnected | It occurs when the MDS cannot connect to the NFS server at secondary | Occurs only in a cache cluster that is created over NFS export. When parallel I/O is configured, this state shows the connectivity between the MDS and the mapped home server, irrespective of other gateway nodes. | Unhealthy | Correct the errant NFS servers on the secondary cluster. |
Unmounted | Primary using NFS detects a change in secondary - sometimes during creation or in the middle of operation if secondary exports are interfered | This can occur if:
|
Unhealthy |
|
Unmounted | The primary that is using the GPFS™ protocol detects a change in the secondary cluster, sometimes during creation or in the middle of an operation | Occurs when there are problems accessing the local mount of the remote file system. | Unhealthy | Check remote filesystem mount on the primary cluster and remount if necessary. |
Dropped | Recovery failed. | Occurs when the local file system is full, space is not available on the primary, or a policy failure during recovery. | Unhealthy | Fix the issue and access the fileset to retry recovery. |
Dropped | A primary with active queue operations is forcibly unlinked | All queued operations are being de-queued, and the fileset remains in the Dropped state and moves to the Inactive state when the unlinking is complete. | Healthy | This is a temporary state. |
Dropped | Old GW node starts functioning properly after a failure | AFM internally performs queue transfers from one gateway to another to handle gateway node failures. | Healthy | The system resolves this state on the next access. |
Dropped | Primary creation or in the middle of an operation if the home exports changed. | Export problems at secondary such as:
|
Unhealthy |
|
Dropped | During recovery or normal operation | If gateway queue memory is exceeded, the queue can get dropped. The memory has to be increased to accommodate all requests and bring the queue back to the Active state. | Unhealthy | Increase afmHardMemThreshold. |
NeedsResync | Recovery on primary | This is a rare state and is possible only under error conditions during recovery. | Unhealthy | The problem gets fixed automatically in the subsequent recovery. |
NeedsResync | Failback on primary or conversion from GPFS/SW to primary | This is a rare state and is possible only under error conditions during failback or conversion. | Unhealthy | Rerun failback or conversion. |
PrimInitProg | Setting up primary and secondary relationship during -
|
This state is used while primary and secondary are in the process of establishing a relationship while the psnap0 is in progress. All operations are disallowed till psnap0 is taken locally. This should move to active when psnap0 is queued and played on the secondary side. | Healthy | Review errors on psnap0 failure if fileset state is not active. |
PrimInitFail | Failed to set up primary and secondary relationship during -
|
This is a rare failure state when the psnap0 has not been created at the primary. In this state no data is moved from the primary to the secondary. The administrator should check that the gateway nodes are up and file system is mounted on them on the primary. The secondary fileset should also be setup correctly and available for use. | Unhealthy |
|
FailbackInProgress | Primary failback started | This is the state when failback is initiated on the primary. | Healthy | None |