Stretched system configuration details

You can create an enhanced stretched system configuration where each node on the system is physically on a different site. When used with mirroring technologies, such as volume mirroring or Copy Services, these configurations can be used to maintain access to data on the system in the event of power failures or site-wide outages.

The enhanced stretched system configuration with the topology attribute of the system set to stretched is detailed here. Older ways of configuring a stretched system are described in previous versions of the IBM Knowledge Center that are still supported. It is possible to non-disruptively move to the current enhanced stretched system configuration by following the final configuration steps that are presented here so that you get better availability and disaster recovery. It is also possible to non-disruptively move from stretched system configuration to HyperSwap® system configuration for even better availability, performance, and disaster recovery. Contact IBM® Remote Technical support center for guidance on changing the topology of an existing system.

Note: If the objective of your solution design is high availability, then it is better to use an IBM HyperSwap topology instead of an enhanced stretched system configuration. However, if the objectives include topics like disaster recovery, complex Copy Services, or highest scalability, then consider the restrictions of the current version of HyperSwap. For more information, see Planning for high availability.

In a stretched system configuration, each site is defined as an independent failure domain. If one site experiences a failure, the other site can continue to operate without disruption. You must also configure a third site to host a quorum device that provides an automatic tie-break in the event of a potential link failure between the two main sites. The main site can be in the same room or across rooms in the data center, buildings on the same campus, or buildings in different cities. Different kinds of sites protect against different types of failures.

Sites are within a single location: If each site is a different power phase within single location or data center, the system can survive the failure of any single power domain. For example, one node can be placed in one rack installation and the other node can be in another rack. Each rack is considered a separate site with its own power phase. In this case, if power was lost to one of the racks, the partner node in the other rack could be configured to process requests and effectively provide availability to data even when the other node is offline due to a power disruption.
Each site is at separate locations: If each site is a different physical location, the system can survive the failure of any single location. These sites can span shorter distances, for example two sites in the same city, or they can be spread farther geographically, such as two sites in separate cities. If one site experiences a site-wide disaster, the remaining site can remain available to process requests.

If configured properly, the system continues to operate after the loss of one site. The key prerequisite is that each site contains only one node from each pair of nodes. Simply placing a pair of nodes from the same system in different sites for a stretched system configuration does not provide high availability. You must also configure the appropriate mirroring technology and ensure that all configuration requirements for those technologies are properly configured.

The system supports Fibre Channel and iSCSI connections to hosts in stretched system environments. However, NVMe-based connections to hosts are not supported in stretched system configurations.

Stretched system and Metro Mirror or Global Mirror

A stretched system is designed to continue operation after the loss of one failure domain.

The stretched system cannot guarantee that it can operate after the failure of two failure domains. If the enhanced stretched system function is configured, you can enable a manual override for this situation. You can also use Metro Mirror or Global Mirror on a second system for extended disaster recovery with either an enhanced stretched system or a conventional stretched system. You configure and manage Metro Mirror or Global Mirror partnerships that include a stretched system in the same way as other remote copy relationships. The system supports SAN routing technology, which includes FCIP links, for intersystem connections that use Metro Mirror or Global Mirror.

The two partner systems can not be in the same production site. However, they can be collocated with the storage system that provides the active quorum disk for the stretched system.

Configuration steps

These additional configuration steps can be done by using the command-line interface (CLI) or the management GUI.

Each node in the system must be assigned to a site. Use the chnode CLI command. If additional nodes are cabled to the system, you can specify these nodes as hot-spare nodes. Hot-spare nodes can nondisruptively take over host I/O operations if any node on the site becomes unavailable. For more information, see the topic about adding hot-spare nodes.
Each back-end storage system must be assigned to a site. Use the chcontroller CLI command.
Each host must be assigned to a site. Use the chhost CLI command
After all nodes, hosts, and storage systems are assigned to a site, the enhanced mode must be enabled by changing the system topology to stretched.
For best results, configure an enhanced stretched system to include at least two I/O groups (four nodes). A system with just one I/O group cannot guarantee to maintain mirroring of data or uninterrupted host access in the presence of node failures or system updates.

The stretched system cannot guarantee that it can operate after the failure of two failure domains. You can enable a manual override for this situation if the enhanced stretched system function is configured. You can also use Metro Mirror or Global Mirror with either an enhanced stretched system or a conventional stretched system on a second system for extended disaster recovery. You configure and manage Metro Mirror or Global Mirror partnerships that include a stretched system in the same way as other remote copy relationships. The system supports SAN routing technology (including FCIP links) for intersystem connections that use Metro Mirror or Global Mirror.