See an example of how to set up synchronous IBM
Storage Scale replication to recover from a site
failure.
Synchronous replication is described in the topic Synchronous mirroring with GPFS replication and in Figure 1 of that topic.
This example is based on the following configuration:
- Site A
-
- Nodes: nodeA001, nodeA002, nodeA003, nodeA004
- Disk devices: diskA1 and diskA2. These devices are SAN-attached and accessible
from all the nodes at site A and site B.
- Site B
-
- Nodes: nodeB001, nodeB002, nodeB003, node B004
- Disk devices: diskB1 and diskB2. These devices are SAN-attached and accessible
from all the nodes at site A and site B.
- Site C
- Note that site C contains only one node, which will be defined as a quorum node and a
client node.
- Nodes: nodeC
- Disks: diskC. This disk is an NSD defined on an internal disk of nodeC and is
directly accessible only from site C.
-
Create an IBM
Storage Scale cluster with the
mmcrcluster command and a node file.
-
Create a node file that is named clusterNodes with the following
contents:
nodeA001:quorum-manager
nodeA002:quorum-manager
nodeA003:quorum-manager
nodeA004:client
nodeB001:quorum-manager
nodeB002:quorum-manager
nodeB003:quorum-manager
nodeB004:client
nodeC:quorum-client
-
Issue the following command to create the cluster:
mmcrcluster -N clusterNodes
Note: The cluster is created with the Cluster Configuration Repository (CCR) enabled. This option is
the default on IBM
Storage Scale 4.1 or later.
-
Issue the following command to enable the unmountOnDiskFile attribute on
nodeC:
mmchconfig unmountOnDiskFail=yes -N nodeC
Enabling this attribute
prevents false disk errors in the SAN configuration from being reported to the file system
manager.
Important: In a synchronous replication environment, the following rules are
good practices:
- The following rules apply to nodeC, which is the only node on site C and is also a
client node and a quorum node:
- Do not designate the nodeC as a manager node.
- Do not mount the file system on nodeC.
To avoid unexpected mounts, create the
following empty file on
nodeC:
/var/mmfs/etc/ignoreAnyMount.<file_system_name>
For
example, if the file system is
fs0, create the following empty
file:
/var/mmfs/etc/ignoreAnyMount.fs0
Note: If you create an
ignoreAnyMount.<file_system_name>
file, you cannot
manually mount the file system on nodeC.
If you do not follow these practices, an unexpected file system unmount can occur during site
failures, because of the configuration of nodeC and the
unmountOnDiskFail option.
- In the sites that do not contain a single quorum client node (sites A and B in
this example), designate at least one quorum node from each site as a manager node. During a site
outage, a quorum node can take over as a manager node.
-
Create a set of network shared disks (NSDs) for the cluster.
-
Create the stanza file clusterDisks with the following NSD stanzas:
%nsd: device=/dev/diskA1
servers=nodeA002,nodeA003
usage=dataAndMetadata
failureGroup=1
%nsd: device=/dev/diskA2
servers=nodeA003,nodeA002
usage=dataAndMetadata
failureGroup=1
%nsd: device=/dev/diskB1
servers=nodeB002,nodeB003
usage=dataAndMetadata
failureGroup=2
%nsd: device=/dev/diskB2
servers=nodeB003,nodeB002
usage=dataAndMetadata
failureGroup=2
%nsd: device=/dev/diskC1
servers=nodeC
usage=descOnly
failureGroup=3
Important: Note that the stanzas make the following failure group assignments:
- The disks at site A are assigned to failure group 1.
- The disks at site B are assigned to failure group 2.
- The disk that is local to nodeC is assigned to failure group 3.
-
Issue the following command to create the NSDs:
-
Issue the following command to verify that the network shared disks are created:
mmlsnsd -m
The command should display output like the
following:
Disk name NSD volume ID Device Node name Remarks
---------------------------------------------------------------------------------
gpfs1nsd 0972445B416BE502 /dev/diskA1 nodeA002 server node
gpfs1nsd 0972445B416BE502 /dev/diskA1 nodeA003 server node
gpfs2nsd 0972445B416BE509 /dev/diskA2 nodeA002 server node
gpfs2nsd 0972445B416BE509 /dev/diskA2 nodeA003 server node
gpfs3nsd 0972445F416BE4F8 /dev/diskB1 nodeB002 server node
gpfs3nsd 0972445F416BE4F8 /dev/diskB1 nodeB003 server node
gpfs4nsd 0972445F416BE4FE /dev/diskB2 nodeB002 server node
gpfs4nsd 0972445F416BE4FE /dev/diskB2 nodeB003 server node
gpfs5nsd 0972445D416BE504 /dev/diskC1 nodeC server node
-
Issue the following command to start IBM
Storage Scale on all the nodes of the cluster:
-
Create a file system fs0 with default replication for metadata (
-m
2
) and data (-r 2
).
Issue the following
command:
mmcrfs /gpfs/fs0 fs0 –F clusterDisks -m 2 -r 2
-
Mount the file system fs0 on all the cluster nodes at site A and
site B.
-
This step is optional and for ease of use.
Issue the following three commands to create node classes for sites
A,
B, and
C:
mmcrnodeclass gpfs.siteA -N prt001st001,prt002st001,prt003st001,prt004st001,nsd001st001,nsd002st001
mmcrnodeclass gpfs.siteB -N prt008st001,prt007st001,prt006st001,prt005st001,nsd004st001,nsd003st001
mmcrnodeclass gpfs.siteC -N nsd005st001
You
can now use node class names with
IBM
Storage Scale
commands ("mm" commands) to recover sites easily after a cluster failover and failback. For example,
with the following command you can bring down all the nodes on site
B with one parameter,
rather than having to pass all the node names for site
B into the
command:
mmshutdown -N gpfs.siteB
For information on the recovery
procedure, see Failback with temporary loss using the Clustered Configuration Repository (CCR) configuration mechanism.
The cluster is configured with synchronous replication to recover from a site failure.