Node monitoring
Node monitoring provides continuous tracking details of the health, performance, and availability of nodes within a hardware appliance. With node monitoring, you can track nodes status and quickly determine used and unused drives, invalid configuration, loosen or unseated connection details of the node in the IBM Fusion HCI System. This topic provides you instructions and guidelines to monitor nodes from the Overview dashboard page.
- Go to
IBM Fusion HCI System user interface to view the graphical
view of the hardware appliance.
Under the Resource summary section, you can find the total number of nodes along with their health statuses.
Figure 1. Example Overview page showing error for a node
in
the - Hover over a graphical view of the hardware to identify node details.
For node or service node, it shows the name of the hardware, the health status, the type of node, and the rack unit.
- Go through the color indicators and decode their statuses.
- Fix errors, failures, and warnings based on the guidance.
- After all the errors and failures are fixed, go to the Overview dashboard page and check the health status of problematic nodes. For more information about nodes, see Node or service node details.
Understanding color indicators and decoding status
- Node or service node in Green color
- It indicates that the node or service node is in a healthy or normal state, and no action is required.
- Node or service node in Red or Yellow color
- It indicates that the node is failed or in a degraded state. Do the following steps to resolve
the issue:
- Click the component that is in red or yellow color.
A slide out window is displayed with the Hardware status, type, firmware, S/N, rack, and rack unit details. For more information about nodes, see Node or service node details.
- Go through the details in the slide out window. If you want more information about the node,
click View full details.
It opens Nodes page that includes the front, rear, and inside view of the node, recent events, and additional details of the node and its internal components.
- Select Front in the Components section to see front
view of the node and hover over a graphical view to check internal components such as used and
unused drive slots along with their connection status.
To debug further, click the internal component that is in red color. It opens a new slide out pane for drives with more details such as name, slot, type, status, total capacity (GB), and serial number.
- Select Back in the Components section to see back
view of the node and hover over a graphical view to check internal components such as used and
unused adapter slots along with their invalid configuration and loosen or unseated connection
details.
To debug further, click the internal component that is in red or yellow color. It opens a new slide out pane for adapters with more details such as slot, port, type, speed, status, network address, adapter, bond, and connected to.
- Select Internal in the Components section to see
internal view of the node and hover over a graphical view to check internal components such as used
and unused DIMM slots, CPUs, and fans status.
To debug further, click the internal component that is in red or yellow color. It opens a new slide out pane with more details. For example, for a DIMM, you can see details such as slot, state, capacity, memory type, and serial number.
- Click View table to check all internal components and their status in the
table format.
It opens Components table page that includes Storage Drives, OS Drives, Ports, CPUs, DIMMs, Fans, and PSUs in seven tab pages. For more information about node details, see Node or service node details.
- Go through the Recent events section to understand the error. Note: The recent events pane includes details of the last five events that occurred for the node.
- Click View all to go to the events page and view all recent events on the
hardware component.
The BMYxxx code and the error message inform you about the error and possible corrective actions.
- Go through the Details section to get more details of the node such as
type, model, S/N, IPv6 address, rack, rack unit, firmware, architecture, CPU cores, frequency,
memory, energy consumption, and temperature.Important: Click Details under Firmware in the Details section to get more details about the firmware versions.
- Click Actions drop down to download the logs of the node and to do power operations, restarting the node, and node maintenance activities. For more information, see Administering nodes and racks.
- To diagnose and take corrective action, try the following options:
- Check whether the hardware is in the correct position. To view the rack positions of the nodes, see Hardware overview of a single rack.
- Check the BMYxxx errors. For more information about individual BMYxxx codes and their corrective
action, see Compute events and error codes.
If you are not able to fix the issue, contact IBM support .
- Click the component that is in red or yellow color.
- Node or service node in Gray color
- It indicates that the node is in a disabled or powered off state.
To power on or power off the hardware, see Node power operations section in Administering nodes and racks.
- Node or service node in color blue with diagonal stripes
- It indicates that an action is in progress on the hardware, such as power on and power
down.To know more about the actions on a node hardware, see the following links:
- Power operations. See Administering nodes and racks.
- Configuration expansion. See Scaling the cluster. Here, you can see the procedure to add nodes to the cluster or add storage to a node. To add a rack, see Adding expansion racks.