Setting up IBM Storage Scale synchronous replication

See an example of how to set up synchronous IBM Storage Scale replication to recover from a site failure.

Synchronous replication is described in the topic Synchronous mirroring with GPFS replication and in Figure 1 of that topic.

This example is based on the following configuration:

Site A
  • Nodes: nodeA001, nodeA002, nodeA003, nodeA004
  • Disk devices: diskA1 and diskA2. These devices are SAN-attached and accessible from all the nodes at site A and site B.
Site B
  • Nodes: nodeB001, nodeB002, nodeB003, node B004
  • Disk devices: diskB1 and diskB2. These devices are SAN-attached and accessible from all the nodes at site A and site B.
Site C
Note that site C contains only one node, which will be defined as a quorum node and a client node.
  • Nodes: nodeC
  • Disks: diskC. This disk is an NSD defined on an internal disk of nodeC and is directly accessible only from site C.
  1. Create an IBM Storage Scale cluster with the mmcrcluster command and a node file.
    1. Create a node file that is named clusterNodes with the following contents:
      nodeA001:quorum-manager
      nodeA002:quorum-manager
      nodeA003:quorum-manager
      nodeA004:client
      nodeB001:quorum-manager
      nodeB002:quorum-manager
      nodeB003:quorum-manager
      nodeB004:client
      nodeC:quorum-client
      
    2. Issue the following command to create the cluster:
      mmcrcluster -N clusterNodes
      Note: The cluster is created with the Cluster Configuration Repository (CCR) enabled. This option is the default on IBM Storage Scale 4.1 or later.
  2. Issue the following command to enable the unmountOnDiskFile attribute on nodeC:
    mmchconfig unmountOnDiskFail=yes -N nodeC
    Enabling this attribute prevents false disk errors in the SAN configuration from being reported to the file system manager.
    Important: In a synchronous replication environment, the following rules are good practices:
    • The following rules apply to nodeC, which is the only node on site C and is also a client node and a quorum node:
      • Do not designate the nodeC as a manager node.
      • Do not mount the file system on nodeC.
        To avoid unexpected mounts, create the following empty file on nodeC:
        /var/mmfs/etc/ignoreAnyMount.<file_system_name>
        For example, if the file system is fs0, create the following empty file:
        /var/mmfs/etc/ignoreAnyMount.fs0
        Note: If you create an ignoreAnyMount.<file_system_name> file, you cannot manually mount the file system on nodeC.
      If you do not follow these practices, an unexpected file system unmount can occur during site failures, because of the configuration of nodeC and the unmountOnDiskFail option.
    • In the sites that do not contain a single quorum client node (sites A and B in this example), designate at least one quorum node from each site as a manager node. During a site outage, a quorum node can take over as a manager node.
  3. Create a set of network shared disks (NSDs) for the cluster.
    1. Create the stanza file clusterDisks with the following NSD stanzas:
      %nsd: device=/dev/diskA1
      servers=nodeA002,nodeA003
      usage=dataAndMetadata
      failureGroup=1
      %nsd: device=/dev/diskA2
      servers=nodeA003,nodeA002
      usage=dataAndMetadata
      failureGroup=1
      %nsd: device=/dev/diskB1
      servers=nodeB002,nodeB003
      usage=dataAndMetadata
      failureGroup=2
      %nsd: device=/dev/diskB2
      servers=nodeB003,nodeB002
      usage=dataAndMetadata
      failureGroup=2
      %nsd: device=/dev/diskC1
      servers=nodeC
      usage=descOnly
      failureGroup=3
      
      Important: Note that the stanzas make the following failure group assignments:
      • The disks at site A are assigned to failure group 1.
      • The disks at site B are assigned to failure group 2.
      • The disk that is local to nodeC is assigned to failure group 3.
    2. Issue the following command to create the NSDs:
      mmcrnsd –F clusterDisks
    3. Issue the following command to verify that the network shared disks are created:
      mmlsnsd -m
      The command should display output like the following:
      
      Disk name    NSD volume ID      Device         Node name         Remarks
      ---------------------------------------------------------------------------------
       gpfs1nsd     0972445B416BE502   /dev/diskA1    nodeA002          server node
       gpfs1nsd     0972445B416BE502   /dev/diskA1    nodeA003          server node
       gpfs2nsd     0972445B416BE509   /dev/diskA2    nodeA002          server node
       gpfs2nsd     0972445B416BE509   /dev/diskA2    nodeA003          server node
       gpfs3nsd     0972445F416BE4F8   /dev/diskB1    nodeB002          server node
       gpfs3nsd     0972445F416BE4F8   /dev/diskB1    nodeB003          server node
       gpfs4nsd     0972445F416BE4FE   /dev/diskB2    nodeB002          server node
       gpfs4nsd     0972445F416BE4FE   /dev/diskB2    nodeB003          server node
       gpfs5nsd     0972445D416BE504   /dev/diskC1    nodeC             server node
      
  4. Issue the following command to start IBM Storage Scale on all the nodes of the cluster:
    mmstartup -a
  5. Create a file system fs0 with default replication for metadata (-m 2) and data (-r 2).
    Issue the following command:
    mmcrfs /gpfs/fs0 fs0 –F clusterDisks -m 2 -r 2
  6. Mount the file system fs0 on all the cluster nodes at site A and site B.
  7. This step is optional and for ease of use.
    Issue the following three commands to create node classes for sites A, B, and C:
    mmcrnodeclass gpfs.siteA -N prt001st001,prt002st001,prt003st001,prt004st001,nsd001st001,nsd002st001
    mmcrnodeclass gpfs.siteB -N prt008st001,prt007st001,prt006st001,prt005st001,nsd004st001,nsd003st001
    mmcrnodeclass gpfs.siteC -N nsd005st001
    You can now use node class names with IBM Storage Scale commands ("mm" commands) to recover sites easily after a cluster failover and failback. For example, with the following command you can bring down all the nodes on site B with one parameter, rather than having to pass all the node names for site B into the command:
    mmshutdown -N gpfs.siteB

    For information on the recovery procedure, see Failback with temporary loss using the Clustered Configuration Repository (CCR) configuration mechanism.

The cluster is configured with synchronous replication to recover from a site failure.