Troubleshooting

This topic includes troubleshooting information for ESS.

Gathering data to solve problems

See gsssnap command for more information.

If I/O server node installation fails

If the installation of the I/O server nodes fails for any reason, fix the problem and then restart the installation on the management server by running these commands:

 
makeconservercf
nodeset gss_ppc64 osimage=rhels7.1-ppc64-install-gss 
rnetboot gss_ppc64 -V

Red Hat Enterprise Linux update considerations

ESS 3.0 supports Red Hat Enterprise Linux 7.1 (kernel release 3.10.0-229.el7.ppc64). You can update Red Hat Enterprise Linux as needed to address security updates. It is highly recommended that you limit errata updates applied to the Red Hat Enterprise Linux operating system used in the ESS solution to security errata or errata updates requested by service.

Information about a possible update issue follows.

Issue: A yum update command upgrades the Red Hat Enterprise Linux version.

If you are subscribed to Red Hat updates and run yum update, the redhat-release-server package might be updated as well. This could cause issues with OFED installation (mlnxofedinstall, for example) on ESS nodes, because the Red Hat version is not in the installer's supported distribution list.

See Red Hat's solution articles for information about this behavior:

https://access.redhat.com/solutions/33807

https://access.redhat.com/solutions/10185

Resolution:

If the redhat-release-server package has been updated, you can downgrade this package. Run this command:

yum downgrade redhat-release-server

Prevention:

If possible, limit running yum update to security-related errata.

See Red Hat's solution article about applying security updates only:

https://access.redhat.com/solutions/10021

ESS 3.0 issues

Table 1 includes information about known issues in ESS 3.0 and how to resolve these issues. Start of change

Depending on which fix level you are installing, these might or might not apply to you. End of change

Table 1. Known issues in ESS 3.0
Issue	Environment affected	Description	Resolution or action
1. The gssaddnode command fails when trying to add a new node to a cluster so that the total number of nodes is greater than seven.	Clustering Deployment type: installation or upgrade GPFS edition: Advanced or Standard Affected nodes: I/O server, management server	The gssaddnode command allows users to easily add the management server and any new I/O server nodes to a cluster. If you already have an existing GPFS cluster in which the addition of nodes causes the total number of nodes in the cluster to exceed seven, the command will fail due to an array index out of bounds exception.	Here is a workaround for this issue: Log in to a node that already exists within the cluster. Run: `mmaddnode -N NewNode` Accept the license: `mmchlicense {client\|server} --accept -N NewNode` Add the node class: `mmchnodeclass {gss_ppc64\|ems\|NodeClass} add -N NewNode` If this is the first node that is being added as a management server node, run: `/usr/lpp/mmfs/samples/gss/shClientConfig.sh ems` If it is a management server node, check the size of the pagepool and set it to 18G (60% of installed memory) if it is not set correctly. If you are adding the management server, the node class is ems. If you are adding an I/O server node, the node class is gss_ppc64. If you are adding another type of node, consult the GPFS documentation for best practices on the recommended configuration settings.
2. The gssgenclusterrgs command might fail in configurations with multiple building blocks.	Recovery group creation with the gssgenclusterrgs command Deployment type: installation or upgrade GPFS edition: Advanced or Standard Affected nodes: I/O server	In configurations in which building block host names do not follow a sorted order (for example: gssio1, gssio2, gssio3, gssio4), the gssgenclusterrgs command might fail with messages that the partner node cannot be found.	Run the gssgenclusterrgs command. If it fails re-run it with the -N option with one node of one building block at a time.
3. The sg module fails to load after an upgrade to ESS 3.0.	Hardware validation, firmware updates, ESS GUI Deployment type: upgrade GPFS edition: Advanced or Standard Affected nodes: I/O server	The Linux SCSI Generic (sg) kernel module is required by various commands and components to send SCSI commands to devices that understand them. During the upgrade to ESS 3.0, this module is not loaded, which could cause the inability to update firmware, validate hardware topology, and limit GUI functionality.	Here is a workaround for this issue: Before updating host adapter firmware on each I/O server node as part of the upgrade, run these commands from the management server: `xdsh IoNode "modprobe sg"` `xdsh IoNode "echo "sg" > /etc/modules-load.d/gss.conf"` To validate, run these commands from the I/O server node: `lsmod \| grep sg` `cat /etc/modules-load.d/gss.conf` Now load the sg module and insert it into the gss.conf file to enable it automatically when rebooting.
4. An update of the management server node failed due to a conflict with an older version ESS GUI RPM.	Cluster software upgrade Deployment type: upgrade GPFS edition: Advanced Affected nodes: management server	In the GPFS Advanced Edition on ESS 3.0, the gpfs.gss.gui RPM from previous releases causes a conflict when installing the latest version (GPFS 4.1.0.8). The result is a failed update of the management server node to the latest software release. After updating your management server node using the updatenode MgtServerNode -V -P gss_updatenode command, you will see an error similar to the following: `ems1: gss_updatenode [DEBUG]: Error: gpfs.gui conflicts with gpfs.gss.gui-4.1.0-6.ppc64 ems1: gss_updatenode [ERROR]: Updating otherpkgs on ems1 failed RC: 123` If you see this issue, apply the workaround.	To work around this issue, run updatenode again. This will fix the conflict, install the new gpfs.gui RPM, and upgrade the node to the latest ESS 3.0 code. You will then be prompted to reboot in order for the kernel update to complete. After rebooting, run updatenode again complete the upgrade process.
5. GPFS 4.1.0.8 file systems fail to mount on Red Hat Enterprise Linux 7.1 nodes due to a systemd issue.	Cluster file system Deployment type: installation or upgrade GPFS edition: Advanced or Standard Affected nodes: I/O server, management server	ESS 3.0 contains GPFS 4.1.0.8, which has a problem mounting on Red Hat Enterprise Linux 7.1 due to an issue with systemd. Updating cluster nodes to the latest systemd packages will correct the issue.	To work around this issue, follow these steps: Connect your cluster nodes to the Red Hat Network (RHN) and apply the RHBA-2015-0738 errata before mounting your GPFS file systems. If you cannot connect to the RHN directly, you can download the required RPMs and update them manually on each node or create a local yum repository. The minimum required RPMs and levels are: libgudev1-208-20.el7_1.2.ppc64.rpm systemd-208-20.el7_1.2.ppc64.rpm systemd-libs-208-20.el7_1.2.ppc64.rpm systemd-sysv-208-20.el7_1.2.ppc64.rpm The advisory can be found here: https://rhn.redhat.com/errata/RHBA-2015-0738.html