Administering the node

Administering the nodes in the system includes day-to-day operations such as monitoring the health, putting a node into maintenance mode, performing power operations, and doing firmware updates whenever it is available.

For any node drain issues during maintenance, see Issues related to IBM Storage Fusion HCI System node drains.

Configured and Discovered node tabs

From the IBM Storage Fusion menu, click Infrastructure > Nodes. The Node page includes Configured nodes and Discovered nodes tabs. By default, the Node page opens the Configured nodes tab.

Go to Discovered nodes tab to add discovered nodes in a rack that were not added to the OpenShift cluster earlier.

Node details

The following table provides the node details in the nodes list:
  Description
Name The name of the node.
  • Naming convention of nodes added by installation or upsize:

    compute-<rackid>-<rack unit number>.<domainname>

    For a single rack, rackid is always 1.

Hardware status
The node Status column also displays the health of the node.
  • Green tick mark symbol indicates that the status is healthy
  • Warning yellow symbol indicates warning status
  • Critical symbol in red color indicates critical status

The node Status column that is shown on the node inventory page is the node hardware monitoring state on the IBM Storage Fusion user interface.

A node is monitored by connecting to its remote management module. Hence, the node state is a reflection of its connectivity to the monitoring module and its ability to fetch the monitoring data.

Important:
  • This state does not reflect node operating system state and their readiness state in the OpenShift® cluster. As a result, the node status between IBM Storage Fusion and OpenShift user interface might not be the same. :
  • When the IBM Storage Fusion user interface shows Running and the OpenShift user interface shows Not Ready, then follow the steps that are mentioned in the Compute node issues to resolve the OpenShift node status issue.

To see the OpenShift state, go to Dashboard > OpenShift section.

Firmware The firmware version of the node.
Type The options are Compute only, Compute storage, AFM, or GPU.
S/N The hardware serial number.
Rack The name of the rack.
Rack unit The rack unit position of the node in the rack.
CPU cores The amount of CPU in cores.
Memory (GB) The amount of memory in GB.
Use the settings icon to customize the columns. You can customize the number of rows that must be displayed in a page, jump to a specific page, or go to the previous or next page.

Use the Search text box to filter and find a specific node.

Node power operations

Node actions from the ellipsis overflow menu of a node record:
  • Enable maintenance or Disable maintenance When the node is moved to maintenance succesfully, the ellipsis overflow menu option shows Disable maintenance option and the Enable maintenance option is available when the node is not already in maintenance.
  • Power operations:

    Alternatively, you can click the Manage resources in the node details > Inventory tab page and then do power operations.

    Power on and power off operations on a node:
    Note: Do all power operations only from the IBM Storage Fusion HCI System user interface.
    1. Move the node to maintenance.
    2. In the ellipsis menu of the node, click Power off node to power off the node.
    3. In the confirmation window, click Power off.
    4. In the ellipsis menu of the node, click Power on node to power on the node.
    5. In the confirmation window, click Power on.
    6. After you complete the maintenance operations, from the ellipsis overflow menu of the node, click Power on node to power on the node.
    Restart and shutdown operations on a node
    1. Move the node to maintenance.
    2. In the ellipsis menu of the node, click Restart node or Shutdown node. Alternatively, you can click the Manage resources in the node details > Inventory tab page and then do power operations.
    3. In the confirmation window for restart or shutdown actions, click Restart node or Shutdown node accordingly.
      Note: If you shutdown a node, then the node goes offline. While offline, all data collection for this node stops and no changes or upgrades can be made.
    If you want to understand Scale behavior during node reboots, see Scale behavior during node restarts.

Enable and disable node maintenance

When you put a node into maintenance mode, it marks the node to Scheduling disabled and also drains workload from it.You can place a node to maintenance from the user interface or through any operation that needs a server reboot.

To run power operations, place a node in maintenance mode.

The following procedure guides you with the steps to enable or disable maintenance mode on a node from the user interface.
Procedure
  1. In the Nodes page, click the ellipsis menu of the node you want to move to maintenance and click Enable maintenance mode. Alternatively, you can click the Manage resources in the node details > Inventory tab page.
  2. In the Enable maintenance mode confirmation window, click Enable.
  3. Wait for node to go to maintenance mode.
  4. After all the required maintenance operations are completed, move the node out of maintenance. From the ellipsis overflow menu of a node that is in maintenance, select Disable maintainance mode option.
Maintenance node operation timeout is configurable now. The default value is 15 minutes and configurable time can go up to 40 minutes. Edit the MaintenanceWarningTimeout value to configure the maintenance operation.
Important:
  • You cannot move more than one node to maintenance at a time. Also, if the GPFS cluster health is degraded, node maintenance will not succeed. If you ignore, then the IBM Storage Fusion HCI System user interface shows a failed message:
    "Detected problem in previous maintenance operation"
    Fix the root cause of the issue in events or CR status:
    oc describe cmt <instance node name> -n <fusion namespace>
  • If you face an issue retrieving the compute nodes and network components after maintenance mode operation, log out and log in from the user interface.
  • The maintenance mode on a node can take four minutes to 30 minutes to succeed, depending on the workload on the node and the Scale PodDisruptionBudget. If it takes more than 30 minutes, then the operation gets timed out eventually. For more information about this issue, see Issues related to IBM Storage Fusion HCI System node drains.
  • When you put a node into maintenance mode from IBM Storage Fusion HCI System user interface, it marks a node to Scheduling disabled and also drains workload from it.

Firmware upgrade

If firmware upgrade is available for a node, then click the ellipsis overflow menu of the node record and click Upgrade firmware. You can also select multiple nodes at a time and click Upgrade button.

For more information about node firmware upgrade, see Upgrading node firmware.

Failure on a node
If upgrade fails on a node, then click the ellipsis overflow menu of the node record and click Cancel upgrade. The Retry upgrade option is enabled.
Note: Upon clicking Cancel upgrade, the firmware upgrade failed state changes to upgrade available.
If a node is queued up for upgrade or scheduled for upgrade, then the Cancel upgrade option is available in the menu.
  1. If you click Cancel upgrade on a node that is queued up for upgrade, then the node is removed from the upgrade queue.
  2. The cancel option for an ongoing firmware upgrade is not allowed.
Failure when multiple nodes are selected
When you choose multiple nodes for upgrade and a node fails in between, then the rest of the nodes in the sequence changes to Scheduled for the upgrade state. Fix the issue in the failed node and click Retry upgrade to complete the upgrade on the following nodes:
  • Problematic node
  • Rest of the nodes queued for upgrade, which are in Scheduled state.

Add disks

Click Add disks to upsize disks. For the actual procedure to add, see Adding additional storage nodes.

Add racks

Click Actions and select Add racks to expand your IBM Storage Fusion HCI System system with an additional rack. For the procedure to add racks, see Adding expansion racks.

Upsize nodes

Go to Discovered nodes tab to add discovered nodes in a rack that were not added to the OpenShift cluster earlier. For the procedure to add, see Configuring nodes for management.