Configuring nodes for management
How to configure the newly installed nodes so that they are monitored and added to the OpenShift cluster. For nodes that contribute to storage or Scale-related activities, add them to the storage cluster.
Before you begin
Meet the following prerequisites for all node types in the specified sequence:
-
Important: Before you begin the node upsize, make sure that all switches are up and running. Run the vgen script before you start the installation. If any TOR is down, the installation will fail as it is the only switch that supports PXE boot to avoid network looping. The lacp-bypass is not enabled on both the switches.
- The purchased quantity of nodes are delivered to your location with the following information:
- For Compute-storage nodes, the received new node must have a specific number of NVMe drives. That is, the new node must have the same number of disks as the ones that are already there.
- For the Compute-only nodes, the node must have the same hardware configuration as the ones that are already there.
- All new nodes must have a link local IP address and MAC address from IBM.Note: Update your Dynamic Host Configuration Protocol (DHCP) and Domain Name System (DNS) Configuration to include reservations for all nodes in the IBM Storage Fusion System, which includes control, compute, AFM, and GPU. These configurations are based on the MAC addresses shared with you by IBM. For more information, see Setting up the DNS and DHCP for IBM Storage Fusion HCI System.
- Ensure that the IBM Support Representative inserted the nodes in the appropriate rack slots and connected the network and power cables.
- The IBM Storage Fusion appliance is up and running.
- Red Hat®
OpenShift® is up and running and the existing
nodes are healthy.Note: It is not applicable for node replacement.
- When you add a node to Red Hat OpenShift, the operation might get stuck for a long time in the node network configuration policy creation stage. To avoid this issue, do the steps that are mentioned in the troubleshooting section before you add the node. For the workaround steps, see Node issues.
- Naming convention of nodes that are added:
- Nodes added by installation
compute-<rackid>-<rack unit number>.<domainname>
- Nodes added as upsize nodes on Day 2
-
compute-<rackid>-<rack unit number>.<domainname>
For a single rack, the rackid is always 1.
For an example of the base system, see host name conventions.
- Upsizing when backup or restore is in progress
- When a backup or restore job is in progress, you must not upsize the nodes, though backup or
restore jobs might not cause scale-out or scale-up operations to fail.
- When you want to do a scale-out or scale-up operation, turn off the IBM Storage Fusion backup or restore jobs, if any. Do not back up or restore when an upsize action is in progress.
- Disable the IBM Storage Fusion backup or restore job from the IBM Storage Protect Plus user interface before you proceed with a scale-out or scale-up operation.
- Before you start a scale-out or scale-up operation, back up the components, storage, and workloads for IBM Storage Fusion. For more information about backup, see Data protection.
- When the scale-out or the scale-up operation is complete, back up the components, storage, and workloads again for IBM Storage Fusion.
- Upsize nodes to brand-new racks
-
- Upsize node must be powered on only on a running IBM Storage Fusion HCI System. If upsize node is installed on a new rack that is not configured with OpenShift Container Platform, then keep the upsized node powered off (remove the power cables from the node).
- Complete the installation.
- Plug-in the power and switch on the node.
About this task
- The following node types are supported in this release:
- 9155-F01 for AFM
AFM node is a gateway node.
- 9155-C00, 9155-C01, 9155-C04, and 9155-C05 for compute-storage or compute-only (based on the
number of NVMe disks available forIBM Spectrum® Storage Scale Erasure Code Edition (ECE) or Data Foundation) storage cluster. Compute-only node has zero
NVMe Disks, whereas the compute-storage node has two or more disks. You can convert compute-only
node by adding NVMe disks to
it.
Storage nodes have disks to contribute to the storage.
- 9155-G01 for GPU
The Compute and GPU nodes are client nodes that can access storage.
Important: You can add only one type of node to the storage cluster at any point in time.For hardware configuration of compute-storage or compute-only nodes, see Hardware overview of a single rack. - 9155-F01 for AFM
- The allowed number of drives for nodes part of the IBM Storage Scale Erasure Code Edition (ECE) cluster is 2,4,6,8,10.
- In IBM Storage Fusion HCI System rack, when nodes are added they must be added into the lowest available rack position to maintain mechanical stability. The only exceptions to this are the AFM and GPU nodes that have reserved locations in the rack where they must be placed. Also, note that the lowest position, rack position 1, must remain empty.
- Allowed node configuration combinations for high-availability multi-rack:
- Select same number of nodes from all three racks. A minimum of three nodes are required for the configuration.
- For scale out, add an equal number of storage nodes from each rack.
- For scale up, add same number of disks across each rack.
Note: Do not deviate from these specification as it can result in an imbalanced configuration of a Data Foundation cluster.
Procedure
What to do next
- Ensure that AFM nodes need to be tainted to prevent running non-Scale workloads.
This might have occurred because of a lack of taint on
NoSchedule
. The AFM nodes in the IBM Storage Fusion HCI System are only supposed to run Scale and are not supposed to run any other workloads.To avoid scheduling non AFM workloads on AFM nodes, AFM nodes need to be tainted with a special key andNoSchedule
effect. With this taint on AFM nodes, non AFM workloads cannot be scheduled on AFM nodes, but IBM Storage Scale pods get scheduled because those pods tolerate this taint. Follow the steps to resolve this issue:- Run the following command to manually add the following taint to AFM nodes after it is added to
the OpenShift Container Platform
cluster.
oc edit node compute-1-ru23.yourdomain.com
- Add the following lines in the specification section.
spec: taints: - effect: NoSchedule key: afm.isf.ibm.com/noworkload
- Run the following command to manually add the following taint to AFM nodes after it is added to
the OpenShift Container Platform
cluster.