Monitoring fileset states for AFM
AFM fileset can have different states depending on the mode and queue states.
To view the current cache state, run the
mmafmctl filesystem getstate
command, or the
mmafmctl filesystem getstate -j cache_fileset
command. See the following table for the explanation of the cache state:
AFM fileset state | Condition | Description | Healthy or Unhealthy | Administrator's action |
---|---|---|---|---|
Inactive | The AFM cache is created | Operations were not initiated on the cache cluster after the last daemon restart. | Healthy | None |
FlushOnly | Operations are queued | Operations have not started to flush. | Healthy | This is a temporary state and should move to Active when a write is initiated. |
Active | The AFM cache is active | The cache cluster is ready for an operation. | Healthy | None |
Dirty | The AFM is active | The pending changes in the cache cluster are not played at the home cluster. This state does not hamper the normal activity. | Healthy | None |
Recovery | The cache is accessed after primary gateway failure | A new gateway is taking over a fileset as primary gateway after the old primary gateway failed. | Healthy | None |
QueueOnly | The cache is running some operation. | Operations such as recovery, resync, failover are being executed, and operations are being queued and not flushed. | Healthy | This is a temporary state. |
Disconnected | Primary gateway cannot connect to the NFS server at the home cluster. | Occurs only in a cache cluster that is created over an NFS export. When parallel data transfer is configured, this state shows the connectivity between the primary gateway and the mapped home server, irrespective of other gateway nodes. | Unhealthy | Correct the errant NFS servers on the home cluster. |
Unmounted | The cache that is using NFS has detected a change in the home cluster - sometimes during creation or in the middle of an operation if home exports are meddled with. |
|
Unhealthy |
|
Unmounted | The cache that is using the GPFS protocol detects a change in the home cluster, sometimes during creation or in the middle of an operation. | There are problems accessing the local mount of the remote file system. | Unhealthy | Check remote filesystem mount on the cache cluster and remount if necessary. |
Dropped | Recovery failed. | The local file system is full, space is not available on the cache or the primary cluster, or case of a policy failure during recovery. | Unhealthy | Fix the issue and access the fileset to retry recovery. |
Dropped | IW Failback failed. | The local file system is full, space is not available on the cache or the primary cluster, or there is a policy failure during recovery. | Unhealthy | Fix the issue and access the fileset to retry failback. |
Dropped | A cache with active queue operations is forcibly unlinked. | All queued operations are being de-queued, and the fileset remains in the Dropped state and moves to the Inactive state when the unlinking is complete. | Healthy | This is a temporary state. |
Dropped | The old GW node starts functioning properly after a failure. | AFM internally performs queue transfers from one gateway to another to handle gateway node failures. | Healthy | The system resolves this state on the next access. |
Dropped | Cache creation or in the middle of an operation if the home exports changed. |
Export problems at home such as following:
|
Unhealthy |
|
Dropped | During recovery or normal operation | If gateway queue memory is exceeded, the queue can get dropped. The memory has to be increased to accommodate all requests and bring the queue back to the Active state. | Unhealthy | Increase afmHardMemThreshold. |
Expired | The RO cache that is configured to expire. | An event that occurs automatically after prolonged disconnection when the cached contents are not accessible. | Unhealthy | Fix the errant NFS servers on the home cluster |
NeedsFailback | The IW cache that needs to complete failback. | A failback initiated on an IW cache cluster is interrupted and is incomplete. | Unhealthy | Failback is automatically triggered on the fileset, or the administrator can run failback again. |
FailbackInProgress | Failback initiated on IW cache. | Failback is in progress and automatically moves to failbackCompleted | Healthy | None |
FailbackCompleted | The IW cache after failback. | Failback successfully completes on the IW cache cluster. | Healthy | Run mmafmctl failback --stop on the cache cluster. |
NeedsResync | The SW cache cluster during home corruption. | Occurs when the home cluster is accidentally corrupted | Unhealthy | Run mmafmctl resync on the cache. |
NeedsResync | Recovery on the SW cache. | A rare state possible only under error conditions during recovery | Unhealthy | No administrator action required. The system would fix this in the subsequent recovery. |
Stopped | Replication stopped on fileset. | Fileset stops sending changes to the gateway node. Mainly used during planned downtime. | Unhealthy | After planned downtime, run mmafmctl <fs> start -j <fileset> to start sending changes/modification to the gateway node and continue replication. |