Active file management

Active File Management (AFM) enables data sharing across clusters, even if the network is unreliable or has high latency. AFM requires that you create associations between IBM Spectrum Scale clusters, or between an IBM Spectrum Scale cluster and an NFS data source.

The following basic configurations are available for the AFM features:

AFM for data caching and replication. This configuration is referred as AFM.
AFM for disaster recovery (DR). This configuration is referred as AFM DR.

For more information on AFM and AFM DR features, see Active File Management and AFM-based asynchronous disaster recovery (AFM DR) topics in the Product overview section of the IBM Spectrum Scale Knowledge Center.

The Active File Management page in the GUI supports monitoring of the health, performance, and configuration details of AFM, AFM DR, and gateway nodes. You need to use the CLI to configure these features.

Monitoring options available in the Active File Management page

The Active File Management page provides an easy way to monitor the performance, health status, and configuration aspects of the AFM and AFM DR relationships in the IBM Spectrum Scale cluster. It also provides details of the gateway nodes that are part of the AFM or AFM DR relationships.

The following options are available to monitor AFM and AFM DR relationships and gateway nodes:

A quick view that gives the details of top relationships between cache and home sites in an AFM or AFM DR relationship. It also provides performance of gateway nodes by used memory and number of queued messages. The graphs that are displayed in the quick view are refreshed regularly. The refresh intervals depend upon the selected time frame. The following list shows the refresh intervals corresponding to each time frame:
- Every minute for the 5-minutes time frame
- Every 15 minutes for the 1-hour time frame
- Every 6 hours for the 24 hours time frame
- Every two days for the 7 days time frame
- Every seven days for the 30 days time frame
- Every four months for the 365 days time frame
Different performance metrics and configuration details in the tabular format. The following tables are available:
Cache

Provides information about configuration, health, and performance of the AFM feature that is configured for data caching and replication.

Disaster Recovery

Provides information about configuration, health, and performance of AFM DR configuration in the cluster.

Gateway Nodes

Provides details of the nodes that are designated as the gateway node in the AFM or AFM DR configuration.

To find an AFM or AFM DR relationship or a gateway node with extreme values, you can sort the values that are displayed on the table by different attributes. Click the performance metric in the table header to sort the data based on that metric. You can select the time range that determines the averaging of the values that are displayed in the table and the time range of the charts in the overview from the time range selector, which is placed in the upper right corner. The metrics in the table do not update automatically. The refresh button that is placed above the table allows to refresh the table with more recent data.

In an AFM DR relationship, if a secondary fileset is converted in to an acting primary, the GUI continues to show the relationship and adds the label Acting Primary to the secondary fileset name.
A detailed view of the performance and health aspects of the individual AFM or AFM DR relationship or gateway node. To see the detailed view, you can either double-click the row that lists the relationship or gateway node of which you need to view the details or select the item from the table and click View Details. The following details are available for each item:
Cache
- Overview: Provides number of available cache inodes and displays charts that show the amount of data that is transferred, data backlog, and memory used for the queue.
- Events: Provides details of the system health events reported for the AFM component.
- Snapshots: Provides details of the snapshots that are available for the AFM fileset. The snapshots are taken for backup purposes. The snapshot that is taken in the AFM cache relationship is called peer snapshot and it functions in the same way as the GPFS snapshots. When a snapshot is taken on the cache site, it also propagates the request to take snapshot of the home.
- Gateway Nodes: Provides details of the nodes that are configured as gateway node in the AFM configuration.
Disaster Recovery
- Overview: Provides number of available primary inodes and displays charts that show the amount of data that is transferred, data backlog, and memory used for the queue.
- Events: Provides details of the system health events reported for the AFM component.
- Snapshots: Provides details of the snapshots that are available for the AFM fileset. The snapshots taken in the AFM DR are called Recovery Point Objective (RPO) snapshots. These are peer snapshots that are taken at the same time on both the primary and the secondary sites.
- Gateway Nodes: Provides details of the nodes that are configured as gateway node in the AFM configuration.
Gateway Nodes
The details of gateway nodes are available under the following tabs. The same details are available in the Monitoring > Nodes page also.
- Overview tab provides performance chart for the following:
  - Client IOPS
  - Client data rate
  - Server data rate
  - Server IOPS
  - Network
  - CPU
  - Load
  - Memory
- Events tab helps to monitor the events that are reported in the node. Similar to the Events page, you can also perform the operations like marking events as read and running fix procedure from this events view. By default, current issues are listed in the events view. You can filter the events by using the other available filter options. The Monitoring > Events page displays the entire set of events that are reported in the system.
- File Systems tab provides performance details of the file systems that are mounted on the node. File system's read or write throughput, average read or write transactions size, and file system read or write latency are also available.
  Use the Mount File System or Unmount File System options to mount or unmount individual file systems or multiple file systems on the selected node. The nodes on which the file system needs to be mounted or unmounted can be selected individually from the list of nodes or based on node classes.
- NSDs tab gives status of the disks that are attached to the node. The NSD tab appears only if the node is configured as an NSD server.
- SMB and NFS tabs provide the performance details of the SMB and NFS services that are hosted on the node. These tabs appear in the chart only if the node is configured as a protocol node.
- The AFM tab provides details of the configuration and health status of the AFM and AFM DR relationships for which the node is configured as the gateway node.
  It also displays the number of AFM filesets and the corresponding export server maps. Each export map establishes a mapping between the gateway node and the NFS host name to allow parallel data transfers from cache to home. One gateway node can be mapped only to a single NFS server and one NFS server can be mapped to multiple gateway nodes.
- Network tab displays the network performance details.
- Properties tab displays the basic attributes of the node and you can use the Prevent file system mounts option to specify whether you can prevent file systems from mounting on the node.

Monitoring AFM and AFM DR configuration and performance in the remote cluster

The IBM Spectrum Scale GUI can monitor only a single cluster. If you want to monitor the AFM and AFM DR configuration, health, and performance across clusters, the GUI node of the local cluster must establish a connection with the GUI node of the remote cluster. By establishing a connection between GUI nodes, both GUIs display all information for the AFM relationships between the two clusters. Without establishing a connection between the GUI nodes, the GUIs of home or cache cluster cannot be used to display any details of the AFM relationships. To enable remote monitoring capability among clusters, the release level of GUI nodes that are communicating with each other must be 5.0.0 or later.

To establish a connection with the remote cluster, perform the following steps:

Perform the following steps on the local cluster raise the access request:
1. Go to Access > Remote Connections.
2. Select the Request Access option that is available under the Outgoing Requests tab to raise the request for access.
  Note: You can also access the Request Access option from the Files > Active File Management page.
3. In the Request Remote Cluster Access dialog, enter an alias for the remote cluster name and specify the GUI nodes to which the local GUI node must establish the connection.
4. If you know the credentials of the security administrator of the remote cluster, you can also add the user name and password of the remote cluster administrator and skip step 2 .
5. Click Send to submit the request.
Perform the following steps on the remote cluster to grant access:
1. When the request for connection is received in, the GUI displays the details of the request in the Access > Remote Connections > Incoming Requests page.
2. Select Grant Access to grant the permission and establish the connection.

Now, the requesting cluster GUI can monitor the remote cluster. To enable both clusters to monitor each other, repeat the procedure with reversed roles through the respective GUIs.

Note: Only the GUI user with Security Administrator role can grant access to the remote connection requests.

When the remote cluster monitoring capabilities are enabled, you can view the following remote cluster details in the local AFM GUI:

On home and secondary, you can see the AFM relationships configuration, health status, and performance values of the Cache and Disaster Recovery grids.
On the Overview tab of the detailed view, the available home and secondary inodes are available.
On the Overview tab of the detailed view, the details such as NFS throughput, IOPs, and latency details are available, if the protocol is NFS.

The performance and status information on gateway nodes are not transferred to home.

Creating and deleting peer and RPO snapshots

When a peer snapshot is taken, it creates a snapshot of the cache fileset and then queues a snapshot creation at the home site. This ensures application consistency at both cache and home sites. The recovery point objective (RPO) snapshot is a type of peer snapshot that is used in the AFM DR setup. It is used to maintain consistency between the primary and secondary sites in an AFM DR configuration.

Use the Create Peer Snapshot option in the Files > Snapshots page to create peer snapshots. You can view and delete these peer snapshots from the Snapshots page and also from the detailed view of the Files > Active File Management page.