Requirements for HyperSwap systems with RDMA-capable Ethernet ports
If you are configuring a HyperSwap® system that uses RDMA-capable Ethernet ports, ensure that all SAN and RDMA-specific requirements are met.
- Directly connect each node to two or more RDMA-capable Ethernet fabrics at the primary and secondary sites (2 - 4 fabrics are supported). Sites are defined as independent failure domains. A failure domain is a part of the system within a boundary. Any failure within that boundary (such as a power failure, fire, or flood) is contained within the boundary. The failure affects any part that is outside of that boundary. Failure domains can be in the same room or across rooms in the data center, buildings on the same campus, or buildings in different towns. Different kinds of failure domains protect against different types of faults.
- RDMA-capable Ethernet ports can be used for both node-to-node communications and host attachment; however, do not share RDMA-capable Ethernet ports for hosts and node-to-node communications. RDMA-capable ports are not supported for connections to external storage. A variety of other protocols are also supported for host attachment and virtualization of external storage.
- Avoid using interswitch links (ISLs) in paths between nodes and external storage systems. If this configuration is unavoidable, do not oversubscribe the ISLs because of substantial RDMA traffic across the ISLs. For most configurations, trunking is required. Because ISL problems are difficult to diagnose, switch-port error statistics must be collected and regularly monitored to detect failures. In particular, monitor switch port counters that relate to pause frames or fabric congestion.
- Using a single switch at the third site can lead to the creation of a single fabric rather than two independent and redundant fabrics. A single fabric is an unsupported configuration.
- Ethernet port 1 on every node must be connected to the same subnet or subnets. Ethernet port 2 (if used) of every node must be connected to the same subnet (which might be a different subnet from port 1). The same principle applies to other Ethernet ports.
- Use consistency groups to manage the volumes that belong to an application. This structure
ensures that when a rolling disaster occurs, the out-of-date image is consistent and therefore
usable for that application. Use the following guidelines for creating consistency groups:
- Use consistency groups to maintain data that is usable for disaster recovery for each application. Add relationships for each volume for an application to an appropriate consistency group.
- You can add relationships to a consistency group only in certain states, including both sites accessible.
- If you need to add a volume to an application to provide it with more capacity at a time when only one site is accessible, take careful note as you cannot create and add the HyperSwap relationship. Be sure to create the relationship and add it to the group as soon as possible after the failed site is recovered.
- Use a third, dedicated site to house a quorum disk or an IP quorum application. Quorum disks or IP quorum applications provide redundancy if communication is lost between the primary and secondary site. In addition, both contain configuration metadata that is used to recover the system, if necessary. IP quorum applications are used for when the HyperSwap system connects to iSCSI-attached storage systems. iSCSI storage systems cannot be configured on a third site.
- If a storage system is used at the third site, it must support extended quorum disks. More information is available in the interoperability matrices that are available at the following website:
- Place independent storage systems at the primary and secondary sites and use active-active
relationships to mirror the host data between the two sites.
A HyperSwap system locates the active quorum disk at a third site. If communication is lost between the primary and secondary sites, the site with access to the active quorum disk continues to process transactions. If communication is lost to the active quorum disk, an alternative quorum disk at another site can become the active quorum disk.
A system of nodes can be configured to use up to three quorum disks. However, only one quorum disk can be elected to resolve a situation where the system is partitioned into two sets of nodes of equal size. The purpose of the other quorum disks is to provide redundancy if a quorum disk fails before the system is partitioned.
An example HyperSwap system that uses RDMA-capable Ethernet ports to connect nodes is illustrated in Figure 1.

In this configuration, if either the primary site or the secondary site fails, you must ensure that the remaining site retains direct access to the storage system that hosts the quorum disks.
- Do not connect an external storage system in one site directly to an RDMA-capable switch fabric in the other site.
- An alternative configuration can use an extra RDMA-capable switch at the third site with connections from that switch to the primary site and to the secondary site.
- A HyperSwap system configuration is supported only when the storage system that hosts the quorum disks supports extended quorum. Although the system can use other types of storage systems for providing quorum disks, access to these quorum disks is always through a single path.
For quorum disk configuration requirements, see the technote Guidance for Identifying and Changing Managed Disks Assigned as Quorum Disk Candidates at http://www.ibm.com/support.