Limitations and known issues
As of releasing IBM® Spectrum Cluster Foundation Community Edition Version 4.2.2, the following are the product limitations or known issues.
Review the following list of limitations for IBM Spectrum Cluster Foundation Community Edition 4.2.2.
| Limitation | Affects | Description |
|---|---|---|
| Installation errors are found in the pcmconfig.log installation
log during SLES installation. Reference #216371 |
Installation | During IBM Spectrum Cluster Foundation Community
Edition installation
on SLES, one of the following error messages are found in the pcmconfig.log installation
log found in the /opt/pcm/log directory:
|
| Image profiles support the highest version of
a custom package only. Reference #210000 |
Custom packages | Image profiles support the highest version of
a custom package only. If you add two custom packages with the same
name, but a different version, IBM Spectrum Cluster Foundation Community
Edition installs
only the highest version of the custom package. You cannot choose between the higher version and lower version in the image profile. If you want to install the lower version, you must make sure that the /install/contrib/OS_release/arch directory, where OS_release is the operating system release and arch is the architecture, contains only the lower version. For example, ensure that the lowest version is found in the /install/contrib/rhels6.4/x86_64/ directory. |
| Report data is truncated when the requested
report period does not match the configured reporting data collection
time interval. Reference #213677 |
Reports | The reporting data loader collects data for
a specific time interval. Data collection is not rescheduled by the
requested report period, even after you restart the reporting service. For example, if you request a report from 22:00 to 22:43 and the default time interval is 300 seconds, the data is truncated. The data is truncated because the data collection stops at 22:40. |
| Blade and compute node location information
in chassis is not supported in Rack View. Reference #220740 |
Resource Dashboard | Some blade and compute node location information is defined in the node information file using the slot ID number. The location definitions that are defined in the node information file using the slot ID number are not displayed in the Rack View of the Web Portal. |
| The network bridge interface is configured with
the IP address of its Ethernet interface Reference #241898 |
Network bridge | Using the custom script, xHRM bridgeprereq <Ethernet>:<bridge>,
the network bridge interface is configured with the IP address of
its Ethernet interface. After the compute node is provisioned, the Ethernet interface ignores its IP address and uses the Ethernet interface IP address, instead of the one assigned to network bridge. The network bridge inherits the MAC address and IP address from the port added to it. |
| In a RHEL 7.1 on Power® BE
environment, compute nodes cannot access resources such as LDAP and
NFS servers. Reference #36221 |
NAT | In a RHEL 7.1 on Power BE
environment, there is an issue with LDAP integration and problems
within an LSF® cluster using
an external NFS. This is a result of a RHEL 7.1 issue (Reference number
#116824) that exists on Power BE. Compute nodes cannot access resources,
such as LDAP and NFS servers. To resolve this issue, make sure that IBM Spectrum Cluster Foundation Community Edition resources are reachable by compute nodes. |
| LDAP cannot be enabled on Ubuntu node. Reference #110936 |
LDAP | LDAP cannot be enabled on Ubuntu node. Any attempt to enable LDAP (using enableLDAP.sh) on Ubuntu nodes results in libnss-ldap package errors and system failure. |
| IBM
Spectrum Scale™ cluster deployment only supports a deployment
where compute node host names only contain lowercase and alphanumeric
characters. Reference #112317 |
IBM Spectrum Scale | To deploy an IBM Spectrum Scale cluster, make sure that compute node host names only contain alphanumeric characters. For example: node1. |
This section details the known issues in version 4.2.2, along with possible workarounds.
| Issue | Affects | Description | Resolution or action |
|---|---|---|---|
| NIC name is incorrect on compute nodes. Reference #218194 |
Devices | During stateless provisioning, the node status is displayed correctly but the NIC name is incorrect. A compute node's original NIC name might be changed to a new NIC name. | Reload the NIC device to fix the NIC name to
the original name specified.
|
| Adding an OS update from an official website
causes a node operation error. Reference #212472 |
OS updates | Adding an OS update from an official RHEL, CentOS,
or SLES website can cause a node operation error. Some RPM packages on a distributors official website might have unresolved package dependency issues. When a package with a resolved dependency is available, you can readd the package and apply the OS update again. |
To recover from this issue:
|
| Failed to uninstall OS update RPM packages on
stateful nodes after OS update is removed from image profile. Reference #211463 |
OS updates | After an OS update is removed from an image profile and the nodes are synchronized, the OS update RPM packages remain installed on a stateful node. | To uninstall OS update RPM packages on a stateful
node, you must complete the following steps:
|
| The Web Portal does
not display the total memory of a node after the node is provisioned
or replaced. Reference #216589 |
Web Portal | After you provision a compute node, the compute node's free memory does not include the total memory. At this time, the total memory is less than the used memory and a dash (-) is displayed in the Web Portal. The Web Portal refreshes after the next integer clock. | After a node is provisioned or replaced, you
can refresh the Web Portal from
the command line to immediately update the Web Portal display.
|
| Node discovery fails and displays an "no free
leases" error in the /var/log/messages file. Reference #211896 |
Node discovery | After you add compute nodes to the cluster using
node discovery, the nodes are not added to the cluster even after
the compute nodes are powered on. The following
error is found in the /var/log/messages file
on one line:
|
You must set the subnet IP or netmask values
for the provision interface in the Web Portal.
The subnet IP or netmask values must be the same as the values that
are used by the provisioning interface on the management node. For example, if the provision interface has an IP address of 11.0.0.1/24, then you must create a network in IBM Spectrum Cluster Foundation Community Edition that has a subnet IP of 11.0.0.0 and a netmask of 255.255.255.0. |
| After a node is provisioned, some non-IBM machines might
fail on bootup. Reference #213598 |
Node provisioning | During node provisioning, the OS is installed
on the node and the node reboots. The node reboot might fail with
the following error:
|
One possible resolution is to complete the following
steps:
Note: This resolution works on some hardware models
|
| Node status set to defined after
node was provisioned successfully Reference #243956 |
Node provisioning | Node status does not reflect the actual node status after a node was successfully provisioned. Node status remains set to defined. | Ensure that the resolv.conf configuration
file in the /etc directory specifies the correct
private network and the IP address of the management node in the provisioning
network. For example, edit the resolv.conf file,
then restart the xCAT daemon.
|
| On some non-IBM machines, the
default network profile cannot be used since the network device eth0 does
not exist. Reference #58209 (207606) |
Network profiles | By
default, the default network profile assumes that the compute nodes uses eth0 as
the provision network interface. If not, you must create a new network
profile or edit the default network profile according to its real used
network interface. For example, some servers use em1 to connect to the provision network, in that case the network profile must use em1 instead of the default eth0 naming convention. |
Create a network profile that uses the correct naming convention. |
| Monitoring Agent status is sometimes incorrect. Reference #220992 |
Monitoring Agent | After a node is provisioned, the monitoring agent and is not started correctly and the monitoring agent status is Unavailable. The monitoring agent cannot be started if the time on the BIOS is set to a different time then the real current time. | To resolve this issue, run the following command
on one line to restart the monitoring agent:
where noderange is
a list of nodes or node groups. |
| NFS server error in the NFS log file. Reference #220799 |
NFS server | The following NFS server error is found in the
NFS configuration file:
This error is caused by too many connections
for the number of threads. |
To resolve this error, update the number of threads running on the NFS server. The number of threads that are running must match the scale of the cluster. Note: If the cluster has
300 nodes that are provisioned, and 200 nodes are to be synchronized
at the same time, then the NFS thread number must be at least 200.
For RHEL:
For SLES:
|
| Browser is unresponsive. Reference #225460 |
Web Portal | Adding many nodes (2500 nodes and greater) can cause the Web Portal to be unresponsive in Internet Explorer (IE). | Close all IE processes, and open a new browser. If the problem persists, use a different supported browser such as Firefox. |
| Using the Web Portal in Internet
Explorer 9, nodes do not synchronize after updating an image profile. Reference #226841 |
Image profiles | Using the Web Portal in Internet Explorer 9, nodes do not synchronize if the automatic synchronization option is selected. | Using the Web Portal in Internet Explorer 9, after the image profile is updated, synchronize the nodes from the nodes list page using the option. |
| Compute nodes cannot reach external networks through
network address translation (NAT). Reference #244354 |
NAT forwarding | On RHEL7 PPC64, compute nodes cannot reach external networks through NAT which is setup on management node. This disables compute nodes to SSH to an external server, such as an LDAP server. In the case of an LDAP server, it disables users from logging into compute nodes. | First, check if RHEL has any updates that include
IP forwarding. If not, then make sure to configure the system using a network topology where compute nodes can access external networks directly, and not through NAT forwarding on a management node. |
| The LSF master
node is reinstalled and the LSF compute
nodes do not rejoin the LSF cluster. Reference #241161 |
LSF cluster template | If compute node sharing is enabled in the LSF cluster this can cause problems when
the LSF master node is reinstalled.
Node sharing is enabled by setting the LSF_SHARE_CN variable
to Y in the cluster template. When the LSF master node is reinstalled and the post-provision script is executed by the pcm-run-cluster-script-layers command, the compute nodes cannot join the LSF cluster because they cannot mount LSF from the NFS server. |
To have the compute nodes rejoin the LSF cluster, reboot the compute
nodes. To reboot the LSF compute nodes, from the Web Portal, go to Resources tab and click . Select the LSF compute nodes and click . |
| If you are creating a secure VLAN network and
you specify multiple NICs to the same switch in a node information
file, node provisioning using the switch discovery method fails. Reference #245489 |
VLAN | Node provisioning fails if you are setting up a secure VLAN network and the nodes that you are provisioning have multiple NICs all connected to the same switch and are provisioned using the switch discovery method using a copy of the default RHEL 6.5 image profile for x86 or Power systems. In this case, the provisioning failure is caused by an error found in the kickstart configuration template. | To resolve this issue, resolve the errors with
the kickstart configuration template.
|
| IBM Spectrum Cluster Foundation Community
Edition installation
error occurs when installing IBM Spectrum Cluster Foundation Community
Edition on a
Ubuntu management node. Reference #27760 |
Installation | An installation error occurs when installing IBM Spectrum Cluster Foundation Community
Edition on a
Ubuntu management node. The following error messages are displayed: Failed to remove package xcat-server. Use the “rpm -e --nodeps xcat-server” command on Linux or the “dpkg --purge --force-all xCAT-server” command on Ubuntu to remove this package, and restart the installation. Failed to remove package xcat-client. Use the “rpm -e --nodeps xcat-client” command on Linux or the “dpkg --purge --force-all xcat-client” command on Ubuntu to remove this package, and restart the installation. |
Use the rpm -e --nodeps xcat-server command
on Linux or the dpkg --purge --force-all
xCAT-server command on Ubuntu to remove this package, and restart
the installation. If you restart the installation and get the same errors, reboot the management node and install IBM Spectrum Cluster Foundation Community Edition again. |
| Failed to build Ubuntu stateless image profile. Reference #36725 |
Image profiles | After removing OS packages for a stateless Ubuntu
compute node, the following error message is displayed in the Web
Portal: Cannot build image for target image profile. The Ubuntu compute node hangs while provisioning and fails to reprovision. |
To resolve this issue, and provision the Ubuntu compute node, remove the aide package from the Ubuntu stateless image profile and reprovision the node. |
| Ubuntu kernel packages cannot be updated. Reference #31336 |
OS updates | IBM Spectrum Cluster Foundation Community Edition cannot update Ubuntu kernel-related packages such as linux-image-extra-<version> or linux-headers-<version>. | To update Ubuntu kernel-related packages on
compute nodes, do the following:
|
| A segmentation fault error occurs when adding
an OS distribution. Reference #37266 |
OS distribution | When adding an OS distribution, the following
segmentation fault error appears: PAM adding faulty module: /usr/lib64/security/pam_fprintd.so segfault at 7a40 ip 0000000000007a40 sp 00007fff6b5469d8 error 14 in libattr.so.1.1.0[7fb95466b000+4000] By default the fingerprint service is disabled, however it is possible for it to be enabled after installation. If this occurs, the fingerprint service should be disabled. |
To resolve this issue, disable the fingerprint
service. Run the following command:
|
| Problems are encountered when adding a host
name that starts with a number. Reference #38757 |
Host name | Host names cannot start with a number. If a
host starts with a number there can be various error messages found
in the pcmd log file (/opt/pcm/pcmd/log/pcmd.log),
such as: Execution resource does not belong to the allocation. |
Remove the node from IBM Spectrum Cluster Foundation Community Edition, and readd the node using a host name that begins with a letter. |
| In a high availability environment, a RHEL 7.x
stateless image profile failed to build. Reference #43196 |
High availability | In a RHEL 7 and later high availability environment,
stateless image profile creation fails as a result of a defect in
the RHEL 7.x operating system. (Reference number #124177) For each stateless image profile copy that is created, the stateless image profile needs to be regenerated using the genimage command. |
To resolve this issue, regenerate the stateless
image using the genimage command. The genimage command
generates the stateless image profile using the rootimg directory.
This directory needs to be deleted and re-created for every stateless
image profile following these steps:
For example:
For each image profile that you copy, the image
profile uses the old link to the local directory and needs to be re-created
to use the new local directory. |
| Authentication error found in pcmd log. Reference #43399 |
Authentication | The following authentication error is found
in the pcmd log: ERROR [Resource Monitor] pcmd - Resource Monitor error while retrieving server information due to: Authentication failed, credential expired ERROR [Resource Monitor] pcmd - Authentication failed, credential expired |
To resolve this issue, restart the PCMD service:
|
| Locked out of IBM Spectrum Cluster Foundation Community
Edition after
5 failed log in attempts. Reference #105644 |
Web Portal | When logging into IBM Spectrum Cluster Foundation Community Edition whether by Web Portal or SSH into a node, the user credentials are entered incorrectly and after 5 attempts, access is locked. | To resolve this issue, simply wait 5 minutes
and try again. After 5 minutes, the account is unlocked. IBM Spectrum Cluster Foundation Community
Edition uses
the same PAM login authentication as the operating system. To change
the behaviour of the account lock and unlock capabilities, do the
following steps on the management node:
|
| MPI installation fails on CentOS 7.2 x86. Reference #110866 |
MPI | When deploying an LSF cluster
that enables MPI installation, MPI installation fails on CentOS 7.2
x86 nodes. MPI installation fails with the following error: ERROR Failed to install prerequisite package glibc.i686. |
To resolve this issue:
|
| Error message encountered when adding RHEL 7.2 Power BE to IBM Spectrum Cluster Foundation Community
Edition using
the Web Portal. Reference #110688 |
OS distribution | The following error message is encountered when
adding RHEL 7.2 Power BE to IBM Spectrum Cluster Foundation Community
Edition using
the Web Portal. Can not create the default image profiles for this OS distribution. To resolve this error, remove the OS distribution and add it again. If the image profiles for this OS distribution exist, remove the image profiles before removing the OS distribution. |
Error message can be ignored. The OS distribution is added and the corresponding image profile is created. |
| In a high availability environment, the EGO
services cannot be started after services are started on the standby
management node. Reference #96646 |
High availability | In a high availability environment that uses IBM Spectrum Scale shared
storage, when the management node is powered off or rebooted, a failover
to the standby management node is triggered. After the standby management node is started, services resume after a few minutes. To check that all services are running, run the pcmhatool check command. If the EGO service failed to start automatically, it is listed in failed state and must be started manually. |
To start EGO service, on the standby management
node, run the following command:
|