You can create an enhanced stretched system configuration where
each node on the system is physically on a different site. When used with mirroring technologies,
such as volume mirroring or Copy Services, these configurations can be used to maintain access to
data on the system in the event of power failures or site-wide outages.
The enhanced stretched system configuration with the topology
attribute of the system set to stretched is detailed here. Older ways of configuring a stretched
system are described in previous versions of the IBM Knowledge Center that are still supported. It
is possible to non-disruptively move to the current enhanced stretched system configuration by
following the final configuration steps that are presented here so that you get better availability
and disaster recovery. It is also possible to non-disruptively move from stretched system
configuration to HyperSwap® system
configuration for even better availability, performance, and disaster recovery. Contact IBM® Remote Technical support center for guidance on changing the
topology of an existing system.
Note: If the objective of your solution design is high availability, then it is better to use an
IBM
HyperSwap topology instead of an
enhanced stretched system configuration. However, if the objectives include topics like disaster
recovery, complex Copy Services, or highest scalability, then consider the restrictions of the
current version of HyperSwap. For more information, see
Planning for high availability.
In a stretched system configuration, each site is defined as an independent
failure domain. If one site experiences a failure, the other site can continue to operate without
disruption. You must also configure a third site to host a quorum device that provides an automatic
tie-break in the event of a potential link failure between the two main sites. The main site can be
in the same room or across rooms in the data center, buildings on the same campus, or buildings in
different cities. Different kinds of sites protect against different types of failures.
- Sites are within a single location
- If each site is a different power phase within single location or data center, the system can
survive the failure of any single power domain. For example, one node can be placed in one rack
installation and the other node can be in another rack. Each rack is considered a separate site with
its own power phase. In this case, if power was lost to one of the racks, the partner node in the
other rack could be configured to process requests and effectively provide availability to data even
when the other node is offline due to a power disruption.
- Each site is at separate locations
- If each site is a different physical location, the system can survive the failure of any single
location. These sites can span shorter distances, for example two sites in the same city, or they
can be spread farther geographically, such as two sites in separate cities. If one site experiences
a site-wide disaster, the remaining site can remain available to process requests.
If configured properly, the system continues to operate after the loss of one site. The key
prerequisite is that each site contains only one node from each pair of nodes. Simply placing a pair
of nodes from the same system in different sites for a stretched system configuration does not
provide high availability. You must also configure the appropriate mirroring technology and ensure
that all configuration requirements for those technologies are properly configured.
The system supports Fibre
Channel and iSCSI connections to hosts in stretched system environments. However, NVMe-based
connections to hosts are not supported in stretched system configurations.
Stretched system and Metro
Mirror or Global
Mirror
A stretched system is designed to continue operation after the loss of one failure
domain.
The stretched system cannot guarantee that it can operate after the failure of two failure
domains. If the enhanced stretched system function is configured, you can enable a manual override
for this situation. You can also use Metro
Mirror or Global
Mirror on a second
system for
extended disaster recovery with either an enhanced stretched system or a conventional stretched
system. You configure and manage Metro
Mirror or Global
Mirror partnerships that
include a stretched system in the same way as other remote copy relationships.
The
system supports SAN routing technology, which includes FCIP links, for intersystem connections
that use Metro
Mirror or
Global
Mirror.
The
two partner systems can not be in the same production site. However, they can be collocated with the
storage system that provides the active quorum disk for the stretched system.
Configuration steps
These additional configuration steps can be done by using the command-line interface (CLI) or
the
management GUI.
- Each node in the system
must be assigned to a site. Use the chnode CLI command. If additional nodes are cabled to the system, you can specify these nodes as
hot-spare nodes. Hot-spare nodes can nondisruptively take over host I/O operations if any node on
the site becomes unavailable. For more information, see the topic about adding hot-spare
nodes.
- Each back-end storage
system must
be assigned to a site. Use the chcontroller CLI command.
- Each host must be assigned to a site. Use the chhost CLI command
- After all nodes, hosts, and storage systems are assigned to a site, the
enhanced mode must be enabled by changing the system topology to
stretched.
- For best results, configure an enhanced stretched system to include at least two I/O groups
(four nodes). A system with just one I/O group cannot guarantee to maintain mirroring of data or
uninterrupted host access in the presence of node failures or system updates.
The stretched system
cannot guarantee that it can operate after the failure of two failure domains. You can enable a
manual override for this situation if the enhanced stretched system function is configured. You can
also use Metro Mirror or Global Mirror with either an enhanced
stretched system or a conventional stretched system on a second
system for extended
disaster recovery. You configure and manage Metro Mirror or Global Mirror partnerships that include a
stretched system in the same way as other remote copy relationships.
The system
supports SAN routing technology (including FCIP links) for intersystem connections that use Metro Mirror or Global Mirror.