Known issues

This topic describes known issues for ESS.

Start of change

ESS 5.3.2.1 issues

The following table describes known issues in ESS 5.3.2.1 and how to resolve these issues. Depending on which fix level you are installing, these might or might not apply to you.
Table 1. Known issues in ESS 5.3.2.1
Issue Environment affected Description Resolution or action
The gssgennetworks script requires high-speed host names to be derived from I/O server (xCAT) host names using suffix, prefix, or both. High-speed network generation
Type: Install
Version: All
Arch: All
Affected nodes: I/O server and EMS nodes
gssgennetworks requires that the target host name provided in -N or -G option are reachable to create the high-speed network on the target node. If the xCAT node name does not contain the same base name as the high-speed name you might be affected by this issue. A typical deployment scenario is:
gssio1 // xCAT name
gssio1-hs // high-speed
An Issue scenario is:
gssio1 // xCAT name
foo1abc-hs // high-speed name
Create entries in the /etc/hosts with node names that are reachable over the management network such that the high-speed host names can be derived from it using some combination of suffix and/or prefix. For example, if the high-speed host names are foo1abc-hs, goo1abc-hs:
  1. Add foo1 and goo1 to the /etc/hosts using management network address (reachable) in the EMS node only.
  2. Use: gssgennetworks -N foo1,goo1 – suffix abc-hs --create-bond
  3. Remove the entries foo1 and goo1 from the /etc/hosts file on the EMS node once the high-speed networks are created.
Example of how to fix (/etc/hosts):
// Before
<IP><Long Name><Short Name> 192.168.40.21 gssio1.gpfs.net gssio1 192.168.40.22 gssio2.gpfs.net gssio2 X.X.X.X foo1abc-hs.gpfs.net foo1abc-hs X.X.X.Y goo1abc-hs.gpfs.net goo1abc-hs
// Fix
192.168.40.21 gssio1.gpfs.net gssio1 foo1 192.168.40.22 gssio2.gpfs.net gssio2 goo1 X.X.X.X foo1abc-hs.gpfs.net foo1abc-hs X.X.X.Y goo1abc-hs.gpfs.net goo1abc-hs
gssgennetworks -N foo1, goo1 --suffix=abc-hs --create-bond
Running gssutils over PuTTY might shows horizontal lines as “qqq” and vertical lines as “xxx”. ESS Install and Deployment Toolkit

Type: Install or Upgrade
Version: All
Arch: All
Affected Nodes: EMS and I/O server nodes
PuTTY translation default Remote Character set UTF-8 might not translate horizontal line and vertical character sets correctly. 1. On the PuTTY terminal Window > Translation, change Remote character set from UTF-8 to ISO-8859-1:1998 (Latin-1, West Europe) (this should be the first option after UTF-8).
2. Open session.
gssinstallcheck might flag an error regarding page pool size in multi-building block situations if the physical memory sizes differ. Software Validation

Type: Install or Upgrade
Arch: Big Endian or Little Endian
Version: All
Affected nodes: I/O server nodes
gssinstallcheck is a tool introduced in ESS 3.5, that helps validate software, firmware, and configuration settings. If adding (or installing) building blocks of a different memory footprint installcheck will flag this as an error. Best practice states that your I/O servers must all have the same memory footprint, thus pagepool value. Page pool is currently set at ~60% of physical memory per I/O server node.
Example from gssinstallcheck: [ERROR] pagepool: found 142807662592 expected range 147028338278 - 179529339371
1. Confirm each I/O server node's individual memory footprint.
From the EMS, run the following command against your I/O xCAT group: xdsh gss_ppc64 "cat/ proc/meminfo | grep MemTotal"
Note: This value is in KB.

If the physical memory varies between servers and/or building blocks, consider adding memory and re-calculating pagepool to ensure consistency.
2. Validate the pagepool settings in IBM Spectrum Scale™: mmlsconfig | grep -A 1 pagepool
Note: This value is in MB.
If the pagepool value setting is not roughly ~60% of physical memory, then you must consider recalculating and setting an updated value. For information about how to update the pagepool value, see IBM Spectrum Scale documentation on IBM® Knowledge Center.
Creating small file systems in the GUI (below 16G) will result in incorrect sizes

GUI

Type: Install or Upgrade

Arch: Big Endian or Little Endian

Version: All

Affected nodes: All

When creating file systems in the GUI smaller than 16GB (usually done to create CES_ROOT for protocol nodes) the size will come out larger than expected.

There is currently no resolution. The smallest size you might be able to create is 16GB. Experienced users might consider creating a customer vdisk.stanza file for specific sizes you require.

You can try one of the following workarounds:
  • Use three-way replication on the GUI when creating small file systems.
  • Use gssgenvdisks which supports the creation of small file systems especially for CES_ROOT purposes (Refer to the --crcesfs flag).
Creating file systems in the GUI might immediately result in lack of capacity data

GUI

Type: Install or Upgrade

Arch: Big Endian or Little Endian

Version: All

Affected nodes: All

When creating file systems in the GUI you might not immediately see the capacity data show up. You may wait up to 24 hours for the capacity data to display or simply use the command line which should accurately show the file system size.
Canceling disk replacement through GUI leaves original disk in unusable state
GUI
Type: Install or Upgrade
Arch: Big Endian or Little Endian
Version: All
Affected nodes: I/O server nodes
Canceling a disk replacement can lead to an unstable system state and must not be performed. However, if you did this operation, use the provided workaround. Do not cancel disk replacement from the GUI. However, if you did, then use the following command to recover the disk took state:

mmchpdisk <RG> --pdisk <pdisk> --resume

Under Monitoring > Hardware details, you might see enclosures missing location information.
GUI
Type: Install or Upgrade
Arch: Big Endian or Little Endian
Version: All
Affected nodes: N/A
After install or upgrade to ESS 5.3.2.1, you might see missing location information for the enclosures in your system. This does not reflect the true frame U location which can be observed in the Monitoring > Hardware details panel. The current workaround is to wait up to 24 hours for the GUI services to refresh. After this period you will see the enclosure location information fill in.
The GUI wizard might start again after completing the initial setup.
GUI
Type: Install
Arch: Big Endian
Version: All
Affected nodes: N/A
After completing the GUI wizard setup on ESS 5.3.2.1 PPC64BE, you might see the start screen again.

If you see the GUI wizard start screen a second time, type the address of the EMS into the browser and press enter.

https://<ip of EMS over management network>

You will then be taken to the GUI home screen.
Upon upgrades to ESS 5.3.2.1, you might notice missing groups and users in the Monitoring > Capacity GUI panel
GUI
Type: Upgrade
Arch: All
Version: All
Affected nodes: N/A

You might notice one or more missing pools or users after upgrading to ESS 5.3.2.1 in the Monitoring > Capacity GUI panel.

You may also see missing capacity and throughput data under the Monitoring > Nodes panel.

There is currently no resolution or workaround.

Try waiting 24 hours for the GUI to refresh. You can also try clicking Refresh.
Upon upgrades to ESS 5.3.2.1, you might see several Mellanox OFED weak-updates and unknown symbols messages on the console during gss_updatenode.
OFED
Type: Upgrade
Arch: Big Endian and Little Endian
Version: All
Affected nodes: N/A
When building the new OFED driver against the kernel, you might see many messages such as weak-updates and unknown symbols. There is currently no resolution or workaround. These messages can be ignored.
During firmware upgrades on PPC64LE, update_flash might show the following warning:

Unit kexec.service could not be found.


Firmware
Type: Installation or Upgrade
Arch: Little Endian
Version: All
Affected nodes: N/A
  This warning can be ignored.
Setting target node names within gssutils might not persist for all panels. The default host names, such as ems1 might still show.

Deployment

Type: Install or Upgrade

Arch: Big Endian or Little Endian

Version: All

Affected nodes: All

gssutils allows users to conveniently deploy, upgrade, or manage systems within a GUI-like interface. If you run gssutils –N NODE, it must store that node name and use it throughout the menu system. There is a bug that might prevent this from working as designed. Use on one of the following resolutions:
  • Change the JSON file directly as follows.
    1. Issue this command.
      /opt/ibm/gss/tools/bin/gssutils \
      --customize --ems-node-name ems2 \
      --config /tmp/LE.json
    2. Issue this command.
      /opt/ibm/gss/tools/bin/gssutils \
      --config /tmp/LE.json
  • For any given command within gssutils, you can press c to customize. At that point, you can change the node name(s) accordingly.
The GUI does not display the firmware levels for drives.

GUI

Type: Upgrade

Arch: Big Endian

Version:

All

This behavior is seen during upgrade. Use the mmlsfirmware command to view this information.
The 1Gb links show as unknown or unhealthy.

GUI

Type: Install and Upgrade

Arch: Big Endian or Little Endian

Version:

All

This behavior is seen during installation or upgrade. mmhealth does not monitor the health state of IP interfaces that are not used by IBM Spectrum Scale. These are the IP interfaces that have the value None in the grid column Networks.
The mmhealth command shows the status as degraded for an empty slot. (DCS3700 only – 5U84)

GUI / mmhealth

Type: Install and Upgrade

Arch: Big Endian or Little Endian

Version:

All

The handling of mmlsfirmware now marks empty slots. (DCS3700 only – 5U84)

For example:

Running mmlsfirmware --serial-number enclosure_serial results in:

drive EMPTYSLOT <enclosure serial> not_available

not_available is marked for slots that have no drive inserted, by design.

Currently there is no workaround for this issue. It is limited to DCS3700 – 5U84 enclosures.
The md5sum command works only under a folder where binaries are available.

Type: Install and Upgrade

Arch: Big Endian or Little Endian

Version:

All

Upon running this command:


md5sum -c /home/deploy/gss_install-5.3.2.1_ppc64le_ datamanagement_ 20190121T232849Z.md5
The following error occurs: md5sum: gss_install-5.3.2.1_ppc64le_ datamanagement_ 20190121T232849Z: No such file or directory gss_install-5.3.2.1_ppc64le_ datamanagement_ 20190121T232849Z: FAILED open or read md5sum: WARNING: 1 listed file could not be read Press Enter to continue...
The md5sum -c command must be ran from CLI mode and from the folder in which the binary resides.

For example:

md5sum -c /home/deploy/gss_install-5.3.2.1_ppc64le_ datamanagement_ 20190121T232849Z.md5
Infiniband with multiple fabric is not supported.

Type: Install and Upgrade

Arch: Big Endian or Little Endian

Version:

All

In a multiple fabric network, the Infiniband Fabric ID might not be properly appended in the verbsPorts configuration statement during the cluster creation. Incorrect verbsPort setting might cause the outage of the IB network.
It is advised to do the following to ensure that the verbsPorts setting is accurate:
  1. Use gssgennetworks to properly set up IB or Ethernet bonds on the ESS system.
  2. Create a cluster. During cluster creation, the verbsPorts setting is applied and there is a probability that the IB network becomes unreachable, if multiple fabric are set up during the cluster deployment.
  3. Ensure that the GPFS™ daemon is running and then run the mmfsadm test verbs config | grep verbsPorts command.
These steps show the Fabric ID found for each link.
For example:
# mmfsadm test verbs config | grep verbsPorts
mmfs verbsPorts: mlx5_0/1/4 mlx5_1/1/7
In this example, the adapter mlx5_0, port 1 is connected to fabric 4 and the adapter mlx5_1 port 1 is connected to fabric 7. Now, run the following command and ensure that verbsPorts settings are correctly configured to the GPFS cluster.
# mmlsconfig | grep verbsPorts
verbsPorts mlx5_0/1 mlx5_1/1
Here, it can be seen that the fabric has not been configured even though IB was configured with multiple fabric. This is a known issue.

Now using mmchconfig, modify the verbsPorts setting for each node or node class to take the subnet into account.
[root@gssio1 ~]# verbsPorts="$(echo $(mmfsadm test verbs config | grep verbsPorts | awk ’{ $1=""; $2=""; $3=""; print $0} ’))" # echo $verbsPorts mlx5_0/1/4 mlx5_1/1/7
# mmchconfig verbsPorts="$verbsPorts"
-N gssio1 mmchconfig: Command successfully completed mmchconfig: Propagating the cluster configuration
data to all affected nodes.
This is an asynchronous process.

Here, the node can be any GPFS node or node class. Once the verbsPorts setting is changed, make sure that the new, correct verbsPorts setting is listed in the output of the mmlsconfig command.
# mmlsconfig | grep verbsPorts
verbsPorts mlx5_0/1/4 mlx5_1/1/7
During an ESS upgrade, part information and firmware levels under the Hardware Details might be missing.
GUI
Type: Upgrade
Arch: Big Endian or Little Endian
Version: All
Affected nodes: N/A
The ESS GUI might be missing information under Part Info and Firmware version within the Hardware Details panel. There are two workarounds:
  1. Wait for up to 24 hours for the GUI refresh tasks to run.
  2. Try running a series of manual tasks to speed up the process of refreshing the GUI.
    1. Log in to the EMS node.
    2. Change directory to /usr/lpp/mmfs/gui/cli.
    3. Run the following tasks in order.
runtask -c CLUSTER RECOVERY_GROUP
runtask -c CLUSTER DISK_ENCLOSURES
runtask -c CLUSTER ENCLOSURE_FW
runtask -c CLUSTER CHECK_FIRMWARE

Where CLUSTER is either the cluster name or the cluster ID that can be determined by using the mmlscluster command.

After running these tasks, the GUI should refresh with the issues resolved.

During file system creation in the ESS GUI, several inputs are ignored under Configure Properties.
GUI
Arch: Big Endian or Little Endian
Version: 5.3.1.x
Affected nodes: N/A
When creating file systems in the ESS GUI, there are several properties that can be set under Configure Properties. Some of those values are:
  • Enable quota
  • Quota scope
  • Inode access time update
  • Enable DMAPI
  • Enable file system features compatible with release

The GUI ignores these input fields and instead just passes only default values to the mmcrfs command.

You can use the following workarounds:
  • Create a file system with the required values from the command line only.
  • Create the file system from the GUI and modify the values using mmchfs on the command line afterward.
A failed disk's state wouldn't be changed to drained or replace in new enclosures that are added by MES procedure and never be used for any file system.

Type: IBM Spectrum Scale RAID

Arch: Little Endian

Version: ESS 5.3.2.1

Affected Nodes: N/A

If a user runs mmvdisk pdisk change --simulate-failing to fail two pdisks in the new enclosure(s) that are added by using the MES procedure and never be used for any file system, the state of the second pdisk stays at simulate-failing. Then, GUI cannot detect that the second failed disk is replaceable, same as command line which fails because of the state of the disk.
Run the mmchpdisk --diagnose on the failing disk. Or, run mmshutdown or mmstartup on the I/O server node that serves the recovery group that the simulate-failing pdisk belongs to.
Important: This workaround causes a failover.
Cable pulls might result in I/O hang and application failure.

Type: IBM Spectrum Scale RAID

Arch: Big Endian and Little Endian

Version: ESS 5.3.2.1

Affected Nodes: I/O server nodes

If SAS cables are pulled, I/O might hang for an extended period of time during RG or path recovery which could lead to application failures. Change nsdRAIDEventLogShortTermDelay to 30ms (The default is 3000ms):
  1. Run mmchconfig nsdRAIDEventLogShortTermDelay=30.
  2. Restart GPFS.
gssinstall_<arch> and gssinstallcheck report NOT_INST for a few GPFS group RPMs.

Type: Deployment

Arch: Little Endian

Version: ESS 5.3.2.1

Affected Nodes: ALL

By default, deployment does not install RPMs for file audit logging support.

Also, you may see NOT_INST for the Power 8 firmware RPM.

This is the expected behavior.
If the file audit logging feature is required, you can manually install these packages from the EMS GPFS repository.
  • gpfs.kafka
  • gpfs.libkafka

For information on how to install Power 8 firmware package (01SV860_165_165-1.1-1), see Updating the system firmware.

gssinstallcheck flags the MT4115 adapter with the incorrect firmware.

Type: Deployment

Arch: Big Endian and Little Endian

Version: ESS 5.3.2.1

Affected Nodes: ALL

There is a bug that causes gssinstallcheck to flag that the MT4115 adapter has incorrect firmware after deploying or upgrading of ESS 5.3.2.1. If the firmware is showing 12.23.1020 for the MT4115 adapter you can safely ignore this message.
ESS GUI System Setup wizard fails on the Verify Installation screen in the IBM Spectrum Scale active check.
Type: GUI
Arch: Big Endian or Little Endian
Version: 5.3.2.1
Affected nodes: N/A

ESS System Setup wizard fails on the Verify Installation screen.

The displayed error message is: Health monitoring is not active on ‘X’ nodes. Run ‘mmhealth node show GPFS -N all’ command to check why mmhealth does not provide health information for those nodes.

Click Verify again. The error should clear after that.
End of change