Hot-spare node

When you add nodes, you can specify up to four of them as hot-spare nodes. A hot-spare node can become online (handling I/O operations) automatically if needed.

For example, if a node fails, an available hot-spare node that matches the failed node is activated automatically and moves to the Online Spare state. The hot-spare node handles I/O operations for the failed node until it comes back online. After the node returns to the system, the hot-spare node returns to the Spare state, which indicates it can be automatically swapped for other failed nodes on the system.

The loss of a node, either for unplanned reasons, such as hardware failure, or planned outages, such as upgrades, can result in loss of redundancy or degraded system performance. To reduce this possibility, a hot-spare node is kept powered on and visible on the system. A hot-spare node has active system ports, but no host I/O ports, and is not part of any I/O group. If a node fails or is upgraded, this spare node joins the system and assumes the place of the failed node, restoring redundancy. Only host connection on Fibre Channel ports that support node port virtualization (NPIV) can be used for hot-spare nodes. The hot-spare node uses the same N_Port ID Virtualization (NPIV) worldwide port names (WWPNs) for its Fibre Channel ports as the failed node, so host operations are not disrupted. The hot-spare node retains its node identifier when it was the spare. During an upgrade, the spare node is added to the system when a node is removed. The hot-spare node replaces each node that shuts down for upgrade in a system.
Note: IBM® SAN Volume Controller systems support multiple I/O groups on long-term support releases only.

You can assign node pairs to specific I/O groups and then assign the extra nodes as hot-spare nodes. When a hot-spare node is added to the system, it is in Spare state, which indicates that it is not part of an I/O group. If a node in an I/O group fails, a hot-spare node automatically replaces that node and becomes a part of the I/O group. While the hot-spare node is in the I/O group, it is in the Online Spare state and returns to the Spare state when the original node rejoins the I/O group. A system can contain up to four spares at any time, which includes any hot-spare nodes that are online as spare nodes. Ensure that all cabling is correct to ensure that the system detects the nodes. If a node is not detected, review the installation information that was included with the system.

The following restrictions apply to using hot-spare node on the system:
  • Hot-spare nodes cannot be used in systems that use RDMA-capable Ethernet ports for node-to-node communications.
  • Hot-spare nodes can be used with Fibre Channel-attached external storage only.
  • Hot-spare nodes cannot be used on enclosure-based systems.
  • Hot-spare nodes cannot be used with SAS-attached storage.
  • Hot-spare nodes cannot be used with iSCSI-attached storage.
  • Hot-spare nodes cannot be used with storage that is directly attached to the system.

When the hot-spare node is used to replace an existing node, the system attempts to find a spare node that matches the configuration of the replaced node perfectly. However, if a perfect match does not exist, the system continues the configuration check until a matching criteria is found. The following criteria is used by the system to determine suitable hot-spare nodes:

Criteria that requires an exact match
  • Memory capacity
  • Fibre Channel port ID
  • Compression support
  • Site
Criteria that is recommended to match, but can be different
  • Hardware type
  • CPU count
  • Number of Fibre Channel ports

If the criteria are not the same for both, the system uses lower criteria until the minimal configuration is found. For example, if the Fibre Channel ports do not match exactly but all the other required criteria match, then the hot-spare node can still be used. The minimal configuration that the system can use as a hot-spare node includes identical memory, site, Fibre Channel port ID, and, if applicable, compression settings.

Note:

IBM SAN Volume Controller systems continue to support multiple I/O groups on long-term support releases. FlashSystem grid can be used to scale out a storage system.

If the nodes on the system support and are licensed to use encryption, the hot-spare node must also support and be licensed to use encryption. For enhanced stretched configurations, hot-spare nodes must be assigned to a specific site. If a node fails on a particular site, the hot-spare node that is assigned to that site is used if it is a suitable replacement. If you are using standard configuration for a stretched system, you must update to an enhanced stretched system to use hot-spare nodes. In a standard stretched configuration, hot-spare nodes can be selected from the wrong site that overloads inter-system links and causes performance issues.

If an adapter PCI slot location on the spare node does not match the active nodes, an active node cannot be replaced by a spare node by using the swapnode command. If the user encounters the error CMMVC9261E, it means that the command failed because the specified node does not have a status of candidate. It is recommended to have adapters in the same slot for spare nodes and the active node for swapnode replace command to work.

When the online spare node is put into the Service state, it is immediately removed back to spare and 5 minutes later rejoin as online spare, in the cluster. Instead of putting the online spare into service, wait for the original to come back or remove the online spare and then perform their maintenance.

Adding hot-spare nodes by using the command-line interface

To add a spare node to the system, enter the following command, where panel_name is the name of the node that is displayed in the service assistant or in the output of lsnodecandidate command.
addnode -panelname panel_name -spare
For enhanced stretched configurations, use the following command to specify the name and site location of the spare node.
addnode -panelname panel_name -spare -site site_id