Configuring nodes for management

How to configure the newly installed nodes so that they are monitored and added to the OpenShift cluster. For nodes that contribute to storage or Scale-related activities, add them to the storage cluster.

Before you begin

Meet the following prerequisites for all node types in the specified sequence:

Important: Before you begin the node upsize, make sure that all switches are up and running. Run the vgen script before you start the installation. If any TOR is down, the installation will fail as it is the only switch that supports PXE boot to avoid network looping. The lacp-bypass is not enabled on both the switches.
The purchased quantity of nodes are delivered to your location with the following information:
- For Compute-storage nodes, the received new node must have a specific number of NVMe drives. That is, the new node must have the same number of disks as the ones that are already there.
- For the Compute-only nodes, the node must have the same hardware configuration as the ones that are already there.
- All new nodes must have a link local IP address and MAC address from IBM.
  Note: Update your Dynamic Host Configuration Protocol (DHCP) and Domain Name System (DNS) Configuration to include reservations for all nodes in the IBM Storage Fusion System, which includes control, compute, AFM, and GPU. These configurations are based on the MAC addresses shared with you by IBM. For more information, see Setting up the DNS and DHCP for IBM Storage Fusion HCI System.
Ensure that the IBM Support Representative inserted the nodes in the appropriate rack slots and connected the network and power cables.
The IBM Storage Fusion appliance is up and running.
Red Hat® OpenShift® is up and running and the existing nodes are healthy.
Note: It is not applicable for node replacement.
When you add a node to Red Hat OpenShift, the operation might get stuck for a long time in the node network configuration policy creation stage. To avoid this issue, do the steps that are mentioned in the troubleshooting section before you add the node. For the workaround steps, see Node issues.
Naming convention of nodes that are added:

Nodes added by installation

compute-<rackid>-<rack unit number>.<domainname>

Nodes added as upsize nodes on Day 2

compute-<rackid>-<rack unit number>.<domainname>

For a single rack, the rackid is always 1.

For an example of the base system, see host name conventions.

Upsizing when backup or restore is in progress

When a backup or restore job is in progress, you must not upsize the nodes, though backup or restore jobs might not cause scale-out or scale-up operations to fail.

When you want to do a scale-out or scale-up operation, turn off the IBM Storage Fusion backup or restore jobs, if any. Do not back up or restore when an upsize action is in progress.
Disable the IBM Storage Fusion backup or restore job from the IBM Storage Protect Plus user interface before you proceed with a scale-out or scale-up operation.
Before you start a scale-out or scale-up operation, back up the components, storage, and workloads for IBM Storage Fusion. For more information about backup, see Data protection.
When the scale-out or the scale-up operation is complete, back up the components, storage, and workloads again for IBM Storage Fusion.

Upsize nodes to brand-new racks

Upsize node must be powered on only on a running IBM Storage Fusion HCI System. If upsize node is installed on a new rack that is not configured with OpenShift Container Platform, then keep the upsized node powered off (remove the power cables from the node).
Complete the installation.
Plug-in the power and switch on the node.

About this task

The following node types are supported in this release:
- 9155-F01 for AFM
  AFM node is a gateway node.
- 9155-C00, 9155-C01, 9155-C04, and 9155-C05 for compute-storage or compute-only (based on the number of NVMe disks available forIBM Spectrum® Storage Scale Erasure Code Edition (ECE) or Data Foundation) storage cluster. Compute-only node has zero NVMe Disks, whereas the compute-storage node has two or more disks. You can convert compute-only node by adding NVMe disks to it.
  Storage nodes have disks to contribute to the storage.
- 9155-G01 for GPU
  The Compute and GPU nodes are client nodes that can access storage.
Important: You can add only one type of node to the storage cluster at any point in time.
For hardware configuration of compute-storage or compute-only nodes, see Hardware overview of a single rack.
The allowed number of drives for nodes part of the IBM Storage Scale Erasure Code Edition (ECE) cluster is 2,4,6,8,10.
In IBM Storage Fusion HCI System rack, when nodes are added they must be added into the lowest available rack position to maintain mechanical stability. The only exceptions to this are the AFM and GPU nodes that have reserved locations in the rack where they must be placed. Also, note that the lowest position, rack position 1, must remain empty.
Allowed node configuration combinations for high-availability multi-rack:
- Select same number of nodes from all three racks. A minimum of three nodes are required for the configuration.
- For scale out, add an equal number of storage nodes from each rack.
- For scale up, add same number of disks across each rack.
Note: Do not deviate from these specification as it can result in an imbalanced configuration of a Data Foundation cluster.

Procedure

After a node is physically added to a rack, networking is done, and powered on, log in to IBM Storage Fusion HCI System user interface.
Go to Infrastructure > Nodes.
Check whether the node you added to the physical hardware is available in the Discovered nodes tab.
If there are any errors in the configuration, then Error connecting to node message gets displayed in the State column. All of the error conditions have built-in retries. Whenever the event persists for a longer duration, support tickets get raised to check the appliance. If the condition is transient, it recovers automatically. Otherwise, IBM Support checks the appliance.
To add a node to the OpenShift cluster, click the add icon. To add more than one node, select the nodes and click Add to cluster.
In the Add node window, go through the node name and serial number of the nodes to include in the cluster and click Add to confirm.
Go to the Configured nodes tab to see whether the newly added node is in the list with the State as Adding to OpenShift.

When all the nodes are ready for storage configuration, add them to the storage cluster: This final step allows workloads on the nodes to access storage, so complete this step before stateful workloads are scheduled on the nodes.

Option	Description
Global Data Platform	To configure node for Global Data Platform, do the following steps: Go to the Global Data Platform page. The following notification is displayed: Nodes are ready for storage The Storage status of the nodes in the Nodes list is Ready for storage and the Type is Storage. Click Configure nodes link in the notification. In the Configure <node type> nodes window, click Install. The Storage status of the nodes in the Nodes list changes to Installing Storage software. After the configuration is complete, a success notification is displayed and the Storage status of the node changes to Ready. Repeat steps for the remaining available node types one by one.
Fusion Data Foundation	To configure node for Fusion Data Foundation, do the following steps: Go to the Data Foundation page. The following notification is displayed: Nodes are ready for storage The Storage status of the nodes in the Nodes list is Ready for storage and the Type is Storage. Click Configure nodes link in the notification. In the Configure <node type> nodes window, click Install. The Storage status of the nodes in the Nodes list changes to Installing Storage software. After the configuration is complete, a success notification is displayed and the Storage status of the node changes to Ready. Repeat steps for the remaining available node types one by one.

Option

Description

Global Data Platform

To configure node for Global Data Platform, do the following steps:

Go to the Global Data Platform page.
The following notification is displayed:
```
Nodes are ready for storage
```
The Storage status of the nodes in the Nodes list is Ready for storage and the Type is Storage.
Click Configure nodes link in the notification.
In the Configure <node type> nodes window, click Install.
The Storage status of the nodes in the Nodes list changes to Installing Storage software. After the configuration is complete, a success notification is displayed and the Storage status of the node changes to Ready.
Repeat steps for the remaining available node types one by one.

Fusion Data Foundation

To configure node for Fusion Data Foundation, do the following steps:

Go to the Data Foundation page.
The following notification is displayed:
```
Nodes are ready for storage
```
The Storage status of the nodes in the Nodes list is Ready for storage and the Type is Storage.
Click Configure nodes link in the notification.
In the Configure <node type> nodes window, click Install.
The Storage status of the nodes in the Nodes list changes to Installing Storage software. After the configuration is complete, a success notification is displayed and the Storage status of the node changes to Ready.
Repeat steps for the remaining available node types one by one.

To verify, do the following steps:
1. Go to the node details page.
2. In the Inventory tab, go to Disks tab.
3. Check the new size. Also, check whether the disk information reflects the newly added drive.
4. Check the total disk count and Raw capacity after you add the disk.
  1. Go to Infrastructure > Dashboard.
  2. In the Disks tile, check total disk count and Raw capacity.

What to do next

Ensure that AFM nodes need to be tainted to prevent running non-Scale workloads.
This might have occurred because of a lack of taint on NoSchedule. The AFM nodes in the IBM Storage Fusion HCI System are only supposed to run Scale and are not supposed to run any other workloads.
To avoid scheduling non AFM workloads on AFM nodes, AFM nodes need to be tainted with a special key and NoSchedule effect. With this taint on AFM nodes, non AFM workloads cannot be scheduled on AFM nodes, but IBM Storage Scale pods get scheduled because those pods tolerate this taint. Follow the steps to resolve this issue:
1. Run the following command to manually add the following taint to AFM nodes after it is added to the OpenShift Container Platform cluster.
```
oc edit node compute-1-ru23.yourdomain.com
```
2. Add the following lines in the specification section.
```
spec:
  taints:
  - effect: NoSchedule
    key: afm.isf.ibm.com/noworkload
```