Operation with disconnected home

With a cache and home cluster separated by a WAN (wide area network), it might result in intermittent outages and possibly long-term disruptions.

If the primary gateway determines that the home cannot be accessed, the primary gateway waits until the interval time specified in the afmDisconnectTimeout parameter passes, and then changes the cache state to disconnected. This feature is available on all AFM filesets.

In a disconnected state, cached files are served to applications from the cache. Application requests for uncached data return an I/O error. All update operations from the cache complete, and return successfully to the application. These requests remain queued at the gateway until they can be flushed to home.

In a disconnected state, the home cannot be accessed for revalidation. Therefore, the latest updates from home are not available in the cache. Writes, which require revalidation with home might appear temporarily stuck if it is done after the Home fails, and before the cache moves to disconnected state. With revalidation stuck waiting on the unavailable home, the request times out as per the value set in the afmDisconnectTimeout parameter.

In the case of cache filesets based on the NSD protocol, when the home file system is not mounted on the gateway, the cache cluster puts the cache filesets into unmounted state. These cache filesets never enter the disconnected state.

If the remote cluster is unresponsive at the home cluster due to a deadlock, operations that require remote mount access, such as revalidation or reading uncached contents, stop responding until the remote mount becomes available again. This is true for AFM filesets that use the NSD protocol to connect to the home cluster. You can continue accessing cached contents without disruption by temporarily disabling all of the revalidation intervals until the remote mount is accessible again.

If a cache fileset is disconnected for an extended period, the number of file system updates might exceed the buffering capacity of the gateway nodes. In this situation, operations continue in the cache. When the connection to home is restored, AFM runs recovery and synchronizes its local updates to the home cluster.

AFM automatically detects when home is available and moves the cache into the Active state. The callback events afmHomeDisconnected and afmHomeConnected can be used to monitor when a cache changes state.

Filesets using a mapping target go into disconnected state if the NFS server of the Primary Gateway is unreachable, even when the NFS servers of all participating gateways are reachable.

Prefetch tasks that fail due to home disconnection, continue when home is available again.

The following example shows the number of read/write operations that were executed while the home was in the disconnected mode.

node1:/gpfs/cache/fileset_IW # mmafmctl fs1 getstate
             Fileset Name Fileset Target Cache State Gateway Node Queue Length Queue numExec
             ------------ -------------- ------------- ------------ -------------------------
             fileset_IW nfs://node4/gpfs/fshome/fset002new Disconnected node2 0 102  
             
node1:/gpfs/cache/fileset_IW # ls -la
             total 128 
             drwx------ 65535 root root 32768 Oct 13 16:50 .afm
             -rw-r--r-- 1 root root 8 Oct 22 17:02 newfile2
              drwx------ 65535 root root 32768 Oct 22 05:23 .pconflicts
              drwx------ 65535 root root 32768 Oct 22 17:02 .ptrash
              dr-xr-xr-x 2 root root 32768 Oct 13 16:20 .snapshots  
            
node1:/gpfs/cache/fileset_IW # for i in 1 2 3 4 5 ; do date > file$i ; done
node1:/gpfs/cache/fileset_IW # mmafmctl fs1 getstate
             Fileset Name Fileset Target Cache State Gateway Node Queue Length Queue numExec
             ------------ -------------- ------------- ------------ -------------------------
             fileset_IW nfs://node4/gpfs/fshome/fset002new Disconnected node2 27 102 

# turn on NFS at home

node1:/gpfs/cache/fileset_IW # mmafmctl fs1 getstate
             Fileset Name Fileset Target Cache State Gateway Node Queue Length Queue numExec
             ------------ -------------- ------------- ------------ -------------------------
            fileset_IW nfs://node4/gpfs/fshome/fset002new Dirty node2 27 102  
            
node1:/gpfs/cache/fileset_IW # mmafmctl fs1 getstate
            Fileset Name Fileset Target Cache State Gateway Node Queue Length Queue numExec
            ------------ -------------- ------------- ------------ -------------------------
           fileset_IW nfs://node4/gpfs/fshome/fset002new Dirty node2 27 102  
           
node1:/gpfs/cache/fileset_IW # date > file10
node1:/gpfs/cache/fileset_IW # mmafmctl fs1 getstate
           Fileset Name Fileset Target Cache State Gateway Node Queue Length Queue numExec
           ------------ -------------- ------------- ------------ -------------------------
           fileset_IW nfs://node4/gpfs/fshome/fset002new Active node2 0 134