Nodes

The Nodes page provides an easy way to monitor the performance, health status, and configuration aspects of all available nodes in the IBM Spectrum Scale cluster. The properties of a node display the status of various CES services such as Object, NFS, and SMB as well as the authentication status of these services if they are enabled. It also displays other details such as network status, information on attached NSDs and file systems, and so on.

Nodes tables

The following three different specific nodes tables that provide a pre-filtered view on nodes with specific information:
All nodes
Shows all nodes in the cluster and provides information on node roles, services, node health, and basic performance information on system and client level.
NSD server nodes
Shows all nodes that are NSD servers with specific performance information that is related to NSDs. If there are no NSD servers in the cluster, this table is not displayed.
Protocol nodes
Shows all nodes that are protocol nodes. Specific performance and health information that are related to protocol services are displayed in this table. If there are no protocol nodes in the cluster, this table is not shown.

The three different tables can be customized individually by adding or removing columns by using the Customize Columns action.

You can use the Set Attributes option that is available in the Actions menu to set the node attributes such as site, room, and rack on any of the views. You can set attributes of multiple nodes at a time. The attributes can be used to filter nodes in the nodes view. The attributes are also referenced in the NSD - Nodes topology view.

The cluster export service (CES) status information of each service and component can have the following values:
  • Healthy: The component is working as expected.
  • Disabled: The component is not enabled.
  • Suspended: When a CES is in suspended state, most components also report suspended.
  • Starting: The component (or monitor) recently started. This state is a transient state that is updated after the startup is complete.
  • Unknown: Something is preventing the monitoring from determining the state of the component.
  • Stopped: The component was intentionally stopped. This situation might happen briefly if a service is being restarted due to a configuration change. It might also happen if a user issue the mmces service stop protocol command for a node.
  • Degraded:A problem occurred with the component but not a complete failure. This state does not cause the CES addresses to be reassigned.
  • Failed: The monitoring detected a significant problem with the component that means it is unable to function correctly. This state causes the CES addresses of the node to be reassigned.
  • Dependency failed: This state implies that a component has a dependency that is in a failed state. For example, NFS or SMB service show Dependency failed if authentication is failed.

Based on the status of each component or services, you can identify whether you need to fix an issue to make the node work properly. You can view the error and warning events that belong to a particular node from the Events tab of the detailed view available in the Nodes page. You can use the filtering options to access the required information.

To get an overview of health events across all cluster nodes, use the Monitoring > Events page. You can also use this page to search for all the events that are reported in the system.

Use the Set Attributes option from the Actions menu to set the node attributes such as site, room, and rack. You can set attributes of multiple nodes at a time.

Comparing performance attributes of nodes

The Nodes page provides the following options to analyze the performance of nodes:
  1. A quick view that gives the number of nodes in the system, and the overall performance of nodes based on CPU and memory usages. You can access this view by selecting the expand button that is placed next to the title of the page. You can close this view if not required.
    Many graphs in the overview show the three nodes that have the highest average performance metric over a past period. These graphs are refreshed regularly. The refresh intervals of the top three entities are depended on the displayed time frame as shown below:
    • Every minute for the 5-minutes time frame
    • Every 15 minutes for the 1-hour time frame
    • Every 6 hours for the 24 hours time frame
    • Every two days for the 7 days time frame
    • Every seven days for the 30 days time frame
    • Every four months for the 365 days time frame
  2. A Nodes table that displays many different performance metrics. To find nodes with extreme values, you can sort the values that are displayed in the nodes table by different performance metrics. Click the performance metric in the table header to sort the data based on that metric. You can select the time range that determines the averaging of the values that are displayed in the table and the time range of the charts in the overview from the time range selector, which is placed in the upper right corner. The metrics in the table do not update automatically. The refresh button above the table allows to refresh the table content with more recent data.

Viewing the node details

A detailed view of the performance and health aspects of individual nodes are available in the Nodes page. Select the node for which you need to view the performance details and select View Details. The system displays various performance charts on the right pane.

The detailed performance view helps to drill-down to various performance aspects. The following list provides the performance details that can be obtained from each tab of the performance view:
  • Overview tab provides performance chart for the following:
    • Client IOPS
    • Client data rate
    • Server data rate
    • Server IOPS
    • Network
    • CPU
    • Load
    • Memory
  • Events tab helps to monitor the events that are reported in the node. Similar to the Events page, you can also perform the operations like marking events as read and running fix procedure from this events view. By default, current issues are listed in the events view. You can filter the events by using the other available filter options. The Monitoring > Events page displays the entire set of events that are reported in the system.
  • File Systems tab provides performance details of the file systems that are mounted on the node. File system's read or write throughput, average read or write transactions size, and file system read or write latency are also available.

    You can also mount or unmount individual file systems or multiple file systems on the selected node. For more details, see Mount or unmount file system.

  • NSDs tab gives status of the disks that are attached to the node. The NSD tab appears only if the node is configured as an NSD server.
  • SMB and NFS tabs provide the performance details of the SMB and NFS services that are hosted on the node. These tabs appear in the chart only if the node is configured as a protocol node.
  • Network tab displays the network performance details.
  • AFM tab displays the details of the AFM and AFM DR relationships for which the node is configured as a gateway node.
  • Properties: Provides an overview of the node-related attributes. You can also use the Prevent file system mounts option to allow or prevent from mounting file systems on the node.
Note: The detailed view of a recovery group server node provides the details of all physical disks that have the active path to this node.

Mount or unmount file system

You can use the IBM Spectrum Scale GUI to mount or unmount individual file systems or multiple file systems on the selected nodes. Use the Files > File Systems, Files > File Systems > View Details > Nodes, or Nodes > View Details > File Systems page in the GUI to mount or unmount a file system.

The GUI has the following options related to mounting the file system:
  1. Mount local file systems on nodes of the local IBM Spectrum Scale cluster.
  2. Mount remote file systems on local nodes.
  3. Select individual nodes, protocol nodes, or nodes by node class while selecting nodes on which the file system needs to be mounted.
  4. Prevent or allow file systems from mounting on individual nodes.
    Do the following to prevent file systems from mounting on a node:
    1. Go to Nodes .
    2. Select the node on which you need to prevent or allow file system mounts.
    3. Select Prevent Mounts from the Actions menu.
    4. Select the required option and click Prevent Mount or Allow Mount based on the selection.
  5. Configure automatic mount option. The automatic configure option determines whether to automatically mount file system on nodes when GPFS daemon starts or when the file system is accessed for the first time. You can also specify whether to exclude individual nodes while enabling the automatic mount option. To enable automatic mount, do the following:
    1. From the Files > File Systems page, select the file system for which you need to enable automatic mount.
    2. Select Configure Automatic Mount option from the Actions menu.
    3. Select the required option from the list of automatic mount modes.
    4. Click Configure.
    Note: You can configure automatic mount option for a file system only if the file system is unmounted from all nodes. That is, you need to stop I/O on this file system to configure this option. However, you can include or exclude the individual nodes for automatic mount without unmounting the file system from all nodes.
You can utilize the following unmount features that are supported in the GUI:
  1. Unmount local file system from local nodes and remote nodes.
  2. Unmount a remote file system from the local nodes. When a local file system is unmounted from the remote nodes, the remote nodes can no longer be seen in the GUI. The Files > File Systems > View Details > Remote Nodes page lists the remote nodes that currently mount the selected file system. The selected file system can be a local or a remote file system but the GUI permits to unmount only local file systems from the remote nodes.
  3. Select individual nodes, protocol nodes, or nodes by node class while selecting nodes from which the file system needs to be unmounted.
  4. Specify whether to force unmount. Selecting the Force unmount option while unmounting the file system unmounts the file system even if it is still busy in performing the I/O operations. Forcing the unmount operation affects the outstanding operations and causes data integrity issues. The IBM Spectrum™ Scale system relies on the native unmount command to carry out the unmount operation. The semantics of forced unmount are platform-specific. On certain platforms such as Linux, even when forced unmount is requested, file system cannot be unmounted if it is still referenced by system kernel. To unmount a file system in such cases, identify and stop the processes that are referencing the file system. You can use system utilities like lsof and fuser for this.

Creating and managing user-defined node classes

Node classes are used to group nodes. It helps you to select only the required set of nodes when you want to limit the scope of certain administrative tasks. The two types of node classes that can be defined in the IBM Spectrum Scale system are:

  • System node classes
  • User-defined node classes

The system node classes are hardcoded but you can create user-defined node classes by using the Nodes > Node Classes > Create Node Class option in the IBM Spectrum Scale GUI. While creating a new node class, ensure that you are aware of the following:

  • The name of the new node class must be different from the name of the existing nodes or node classes.
  • You can add individual nodes and other existing node classes in a new node class from the All Nodes and Node Classes tabs of the Create Node Class window.
  • When you add an existing node class in the new node class, the nodes that are part of the existing node class become part of the new node class. When nodes are added or removed in the existing node class at a later time, those changes will also get applied to the new node class.

Use the Modify option to change the node class name and nodes and node classes that are part of an existing node class. You cannot modify system node classes.

Use the Delete option to delete the user-defined node class. You cannot delete the system node classes.