Hot-spare node
When you add nodes, you can specify up to four of them as hot-spare nodes. A hot-spare node can become online (handling I/O operations) automatically if needed.
For example, if a node fails, an available hot-spare node that matches the failed node is activated automatically and moves to the Online Spare state. The hot-spare node handles I/O operations for the failed node until it comes back online. After the node returns to the system, the hot-spare node returns to the Spare state, which indicates it can be automatically swapped for other failed nodes on the system.
You can assign node pairs to specific I/O groups and then assign the extra nodes as hot-spare nodes. When a hot-spare node is added to the system, it is in Spare state, which indicates that it is not part of an I/O group. If a node in an I/O group fails, a hot-spare node automatically replaces that node and becomes a part of the I/O group. While the hot-spare node is in the I/O group, it is in the Online Spare state and returns to the Spare state when the original node rejoins the I/O group. A system can contain up to four spares at any time, which includes any hot-spare nodes that are online as spare nodes. Ensure that all cabling is correct to ensure that the system detects the nodes. If a node is not detected, review the installation information that was included with the system.
When the hot-spare node is used to replace an existing node, the system attempts to find a spare node that matches the configuration of the replaced node perfectly. However, if a perfect match does not exist, the system continues the configuration check until a matching criteria is found. The following criteria is used by the system to determine suitable hot-spare nodes:
If the criteria are not the same for both, the system uses lower criteria until the minimal configuration is found. For example, if the Fibre Channel ports do not match exactly but all the other required criteria match, then the hot-spare node can still be used. The minimal configuration that the system can use as a hot-spare node includes identical memory, site, Fibre Channel port ID, and, if applicable, compression settings.
IBM SAN Volume Controller systems continue to support multiple I/O groups on long-term support releases. FlashSystem grid can be used to scale out a storage system.
If the nodes on the system support and are licensed to use encryption, the hot-spare node must also support and be licensed to use encryption. For enhanced stretched configurations, hot-spare nodes must be assigned to a specific site. If a node fails on a particular site, the hot-spare node that is assigned to that site is used if it is a suitable replacement. If you are using standard configuration for a stretched system, you must update to an enhanced stretched system to use hot-spare nodes. In a standard stretched configuration, hot-spare nodes can be selected from the wrong site that overloads inter-system links and causes performance issues.
If an adapter PCI slot location on the spare node does not match the active nodes, an active node cannot be replaced by a spare node by using the swapnode command. If the user encounters the error CMMVC9261E, it means that the command failed because the specified node does not have a status of candidate. It is recommended to have adapters in the same slot for spare nodes and the active node for swapnode replace command to work.
When the online spare node is put into the Service state, it is immediately removed back to spare and 5 minutes later rejoin as online spare, in the cluster. Instead of putting the online spare into service, wait for the original to come back or remove the online spare and then perform their maintenance.
Adding hot-spare nodes by using the command-line interface
addnode -panelname panel_name -spareaddnode -panelname panel_name -spare -site site_id