Table of contents

Configuring IBM Spectrum Scale Replication

IBM Spectrum Scale replication provides high availability at the storage level by having two consistent replicas of the file system; each available for recovery when the other one fails. The two replicas are kept in-sync by using logical replication-based mirroring that does not require specific support from the underlying disk subsystem.

IBM Spectrum Scale replication can be deployed in the traditional single-site pureScaleĀ® or Geographically Dispersed pureScale Cluster (GDPC) environment to provide seamless failover at the storage level. For a cluster that demands the highest availability and disaster resiliency, IBM Spectrum Scale replication that is configured with three geographically separated sites is recommended. The three sites requirement matches with the GDPC recommended topology. Together, they provide a highly available, active-active pureScale cluster with seamless failover capability over two geographically separated data centers with a third separate tiebreaker site.

Figure 1. Single-site IBM Spectrum Scale replication topology
Single-site IBM Spectrum Scale replication topology
Figure 2. Multi-site IBM Spectrum Scale replication topology
Multi-site IBM Spectrum Scale replication topology
The following are the characteristics of IBM Spectrum Scale replication:
  • Two separate storage controllers are provided for the first and second replica of file system. These storage controllers are referred to as redundancy group 1 and 2 respectively. IBM Spectrum Scale stores both the data and the file system metadata in the redundancy groups.
  • Majority node quorum is used instead of tiebreaker disk quorum for both RSCT and IBM Spectrum Scale clusters. With the presence of two storage controllers in a replicated environment, placing the cluster quorum tiebreaker disks in any one of the storage controllers effectively makes that storage controller a single point of failure.
  • With majority node quorum, the total number of hosts in the cluster is an odd number. With an odd number of hosts, two scenarios that result in ambiguity in the cluster state are avoided:
    • An outage on half of the nodes in the cluster. In a cluster with an odd number of hosts, the cluster either has less than half, or more than half, of the cluster nodes unavailable during an outage. In the case where a cluster has more than half of the cluster nodes available, there is no impact because a majority quorum is maintained. In the case where a cluster has less than half of the cluster nodes available, the result is a total cluster outage because the surviving nodes do not have quorum.
    • A network split where the cluster is split into two halves with no communication between the two subclusters. In this scenario, one of the subclusters has at least one more node than the other. The subcluster with more nodes survives because it has the majority quorum.
    Majority node quorum has the following characteristics:
    • For GDPC with three geographically separated sites, the primary and secondary sites have equal number of members and one CF in each site. A single tiebreaker host exists in the third site. The tiebreaker host is the owner of the file system tiebreaker redundancy group that contains all file systems tiebreaker disks. These disks contain only file system descriptor information (for example, file system configuration metadata).
    • For single-site pureScale cluster, a dedicated tiebreaker host is not mandatory when the cluster size is odd. One of the members is chosen to own the file system tiebreaker redundancy group TB.
  • Each shared file system requires a separate file system tiebreaker disk for file system quorum as well as recovery purposes. A minimum of 50 MB is required for each disk. It can be a local physical disk or logical volume (LV)
  • A dedicated tiebreaker host, if used, requires only TCP/IP access to other hosts in the same cluster. It does not require access to the data in the redundancy group 1 and 2.