Node monitoring

Node monitoring provides continuous tracking details of the health, performance, and availability of nodes within a hardware appliance. With node monitoring, you can track nodes status and quickly determine used and unused drives, invalid configuration, loosen or unseated connection details of the node in the IBM Fusion HCI System. This topic provides you instructions and guidelines to monitor nodes from the Overview dashboard page.

  1. Go to Infrastructure > Overview in the IBM Fusion HCI System user interface to view the graphical view of the hardware appliance.

    Under the Resource summary section, you can find the total number of nodes along with their health statuses.

    Figure 1. Example Overview page showing error for a node
    the rack shows degraded state for AFM node in RU23 and critical state for a storage node at RU15.
    In the graphical view of the hardware appliance, the color indicates the health status of a node.
  2. Hover over a graphical view of the hardware to identify node details.

    For node or service node, it shows the name of the hardware, the health status, the type of node, and the rack unit.

  3. Go through the color indicators and decode their statuses.
  4. Fix errors, failures, and warnings based on the guidance.
  5. After all the errors and failures are fixed, go to the Overview dashboard page and check the health status of problematic nodes. For more information about nodes, see Node or service node details.

Understanding color indicators and decoding status

Node or service node in Green color
It indicates that the node or service node is in a healthy or normal state, and no action is required.
Node or service node in Red or Yellow color
It indicates that the node is failed or in a degraded state. Do the following steps to resolve the issue:
  • Click the component that is in red or yellow color.

    A slide out window is displayed with the Hardware status, type, firmware, S/N, rack, and rack unit details. For more information about nodes, see Node or service node details.

  • Go through the details in the slide out window. If you want more information about the node, click View full details.

    It opens Nodes page that includes the front, rear, and inside view of the node, recent events, and additional details of the node and its internal components.

  • Select Front in the Components section to see front view of the node and hover over a graphical view to check internal components such as used and unused drive slots along with their connection status.

    To debug further, click the internal component that is in red color. It opens a new slide out pane for drives with more details such as name, slot, type, status, total capacity (GB), and serial number.

  • Select Back in the Components section to see back view of the node and hover over a graphical view to check internal components such as used and unused adapter slots along with their invalid configuration and loosen or unseated connection details.

    To debug further, click the internal component that is in red or yellow color. It opens a new slide out pane for adapters with more details such as slot, port, type, speed, status, network address, adapter, bond, and connected to.

  • Select Internal in the Components section to see internal view of the node and hover over a graphical view to check internal components such as used and unused DIMM slots, CPUs, and fans status.

    To debug further, click the internal component that is in red or yellow color. It opens a new slide out pane with more details. For example, for a DIMM, you can see details such as slot, state, capacity, memory type, and serial number.

  • Click View table to check all internal components and their status in the table format.

    It opens Components table page that includes Storage Drives, OS Drives, Ports, CPUs, DIMMs, Fans, and PSUs in seven tab pages. For more information about node details, see Node or service node details.

  • Go through the Recent events section to understand the error.
    Note: The recent events pane includes details of the last five events that occurred for the node.
  • Click View all to go to the events page and view all recent events on the hardware component.

    The BMYxxx code and the error message inform you about the error and possible corrective actions.

  • Go through the Details section to get more details of the node such as type, model, S/N, IPv6 address, rack, rack unit, firmware, architecture, CPU cores, frequency, memory, energy consumption, and temperature.
    Important: Click Details under Firmware in the Details section to get more details about the firmware versions.
  • Click Actions drop down to download the logs of the node and to do power operations, restarting the node, and node maintenance activities. For more information, see Administering nodes and racks.
  • To diagnose and take corrective action, try the following options:
Node or service node in Gray color
It indicates that the node is in a disabled or powered off state.

To power on or power off the hardware, see Node power operations section in Administering nodes and racks.

Node or service node in color blue with diagonal stripes
It indicates that an action is in progress on the hardware, such as power on and power down.
To know more about the actions on a node hardware, see the following links: