Troubleshooting by symptom

You might encounter some common problems while using the IBM® Pattern for IBM Storage Scale.

The mount point gets unmounted upon restarting the IBM Storage Scale Client

Symptom: After you restart the IBM Storage Scale Client, the mount point gets unmounted.

Resolution: Start the IBM Storage Scale Server on the client virtual machine by using the following command:

su - gpfsprod -c sudo /usr/lpp/mmfs/bin/mmstartup

Storage Scale active node might show as "Passive Node" type after restart

After the restart of Storage Scale nodes, the Active primary node might show up as Passive node. This behavior can cause issues while you use the nodes or add new nodes to the cluster.

Symptom: Action such as adding new member on the Passive node type fails.

Resolution: Do the following actions to get the nodes back into Active Primary type:

Check whether the IBM Storage Scale daemon service is up and running. If not, then run the following command to start the daemon service:
```
su - gpfsprod -c 'sudo /usr/lpp/mmfs/bin/mmstartup'
```
After the services are started successfully, make sure GPFS filesystem is mounted by using the df-hk or mmlsmount command.

IBM Storage Scale Server block volume attachment fails with errors for `mmdsh`

Symptom: The IBM Storage Scale Server block volume attachment fails with the following errors for mmdsh, respectively:

Block volume attachment failed with error : mmdsh: Invalid or missing remote shell command: /usr/bin/sshwrap.pl

Block volume attachment failed with error : mmdsh: Invalid or missing remote shell command: /usr/bin/scpwrap.pl

Resolution: Do the following actions, respectively:

Copy sshwrap.pl from /usr/lpp/mmfs/bin and paste it to /usr/bin/.
```
cp /usr/lpp/mmfs/bin/sshwrap.pl /usr/bin/
```
Copy scpwrap.pl from /usr/lpp/mmfs/bin and paste it to /usr/bin/.
```
cp /usr/lpp/mmfs/bin/scpwrap.pl /usr/bin/
```

Download of client private key and client key from mirror node might fail

Symptom: The retrieve a client private key or client key from the Retrieve key operation might fail with the following error message:

Retrieve Client Key: The Client key was not found for this configuration.

Resolution: Retrieve the client private key and the client key from the primary node.

Network Shared Disk (NSD) on node goes down after IBM Storage Scale auto revert

Symptom: After the IBM Storage Scale auto revert, Network Shared Disk (NSD) on the node goes down.

Resolution: Restart NSD on that node by running the following command:

su - gpfsprod -c 'sudo /usr/lpp/mmfs/bin/mmchdisk <Name of file system> start -d <Name of NSD>'

Troubleshooting issues in GPFS/IBM Storage Scale pattern type

Sometimes, when you upgrade the GPFS / IBM Storage Scale pattern type, the cluster may hang indefinitely awaiting active state of GPFS / IBM Storage Scale. The issue may occur whenever you upgrade the Kernel version without changing the versions of other Kernel packages. For more details about Kernel and Kernel packages, see Building IBM Storage Scale portability layer after Linux kernel updates. As a resolution, run the following manual steps to recover the cluster from hung state and to start the auto revert:

Compile GPFS portability layer for this kernel version in a different virtual machine by using the steps mentioned in the Building IBM Storage Scale portability layer after Linux kernel updates topic.
Note: In the Building IBM Storage Scale portability layer after Linux kernel updates topic, you can skip the sub-steps of step 3 to start the node and check for node active state.
Copy the content that is available in the /lib/modules/<upgraded kernel version>/extra folder from the system where the GPFS portability layer is successful and paste it in the /lib/modules/<upgraded kernel version>/extra folder of the virtual machine where the upgrade failed.

Run the following command to start GPFS:

su - gpfsprod -c 'sudo /usr/lpp/mmfs/bin/mmstartup’

Run the following command to check whether all the nodes are in active state:
```
su - gpfsprod -c  'sudo /usr/lpp/mmfs/bin/mmgetstate -aL
```

Mixing of IP address formats

Never mix instances of IPv4 and IPv6 IBM Storage Scale pattern deployments, whether they be client deployments, primary, mirror, or tiebreaker deployments. This scenario is not supported.

IBM Storage Scale Clients - File set names

Do not include blank spaces in file set names.

IBM Storage Scale Clients - Link directories

Do not include blank spaces in link directory names.

IBM Storage Scale Clients - Page pool memory not available

If you see a message saying that the Page Pool Memory can not be obtained in /var/adm/ras/mmfs.logs.latest, this means that the virtual machine does not have enough memory to support IBM Storage Scale, and the client pattern most likely will have to allocate more memory to itself in its configuration values. Ensure that the allocated memory is at least 4 Gb.

If your client fails to deploy, run the Status operation. You might see a IBM Storage Scale error that prevents the page pool from being allocated. Correct any errors and try the deployment again.

IBM Storage Scale Clients - File set already exists

Check whether a file set name used by your client deployment already exists. If it does, unintentional file overwriting might occur. Use the Cluster status operation on the server to list the existing file sets.

IBM Storage Scale Clients - File set quota size is not as expected

If you find that the quota size is not what you expected, use the Cluster status operation on the server to list the existing file sets. If the size is not what the client expects, the reason most likely is that some other client created the file set. If you need a different value, contact the original owner. If a change is agreed to, run the Change File Set operation on the Primary IBM Storage Scale instance to change the size of the quota.

IBM Storage Scale Client - Quota Size Constraints on the file set are ignored

Only non-root users are affected by the file set quota settings.

IBM Storage Scale Client - Connect to server operation fails to update the remote file system information

IBM Storage Scale might not be able to determine if the file system is mounted. If the file system is not mounted, the Connect to server operation might fail, resulting in an error message similar to the following example:

Web_Application-was.11406729401441.GPFSClient: Connect to server: Failed to update the remote file system information for kent ['/usr/lpp/mmfs/bin/mmremotefs', 'update', 'kent', '-f', 'kent', '-C', 'testClusterPassive_pass.purescale.raleigh.ibm.com', '-A', 'yes', '-T', '/gpfs/kent']

This error indicates that the remote file system information was not updated successfully. The IBM Storage Scale Client trace.log will include multiple instances of this message:

mmremotefs: Command was unable to determine whether file system is mounted.

The IBM Storage Scale product documentation notes that when this type of problem occurs, message 6027-1996 is issued with similar wording.

If you encounter this message, perform problem determination, resolve the problem, and reissue the command. If you cannot determine or resolve the problem, you might be able to run the command successfully by first shutting down the IBM Storage Scale daemon on all nodes of the cluster (using mmshutdown -a), ensuring that the file system is not mounted.

If you still cannot resolve the problem, complete the following steps:

Log in to the IBM Storage Scale Client virtual machine instance.
Navigate to /usr/lpp/mmfs/bin/ and run the mmshutdown -a command.
Run the mmstartup command.
Perform the Connect to server operation again.

IBM Storage Scale Client - Connect to server operation fails to unmount the file system

IBM Storage Scale might not be able to unmount a file system if the resource is busy. If the file system is not unmounted, the Connect to server operation might fail, resulting in an error message indicating that the device or resource is busy, similar to the following example:

Web_Application-was.11407238943746.GPFSClient: Connect to server: Failed to unmount the testFSys file system ['/usr/lpp/mmfs/bin/mmumount', 'testFSys', '-f'] umount2: Device or resource busy umount: /gpfs/testFSys: device is busy. (In some cases useful info about processes that use the device is found by lsof(8) or fuser(1)) umount2: Device or resource busy

The IBM Storage Scale Client trace.log will include a message similar to the following example:

umount: /gpfs/testFSys: device is busy.
(In some cases useful info about processes that use
the device is found by lsof(8) or fuser(1))
umount2: Device or resource busy

Refer to the IBM Storage Scale Problem Determination Guide for actions to take when the file system will not unmount.

Ensure that all processes finish accessing the file system, then run the Connect to server operation again.

IBM Storage Scale Server - Disk volume limit exceeded

Only a maximum of 14 storage volumes can be added to any IBM Storage Scale configuration.

IBM Storage Scale Server - Disk volume not in list

Ensure that the correct storage volume is attached.

IBM Storage Scale Server - sudo error: sorry, you must have a tty to run sudo

Ensure that the requiretty option is disabled on the virtual machine. requiretty is an option in the /etc/sudoers file, which prevents sudo operations from non-TTY sessions. The IBM Storage Scale nodes must be able to run sudo commands from scripts.

IBM shared service for IBM Storage Scale - Not deployed before IBM Storage Scale clients

The IBM shared service for IBM Storage Scale must be deployed to a cloud group before deploying any IBM Storage Scale Clients to that same cloud group unless you specified a IBM Storage Scale server at deployment or through the Connect to server operation for virtual application patterns and virtual system patterns.

Deployment will conclude with an error if a IBM Storage Scale Client is deployed to a cloud group which does not have an instance of the IBM shared service for IBM Storage Scale deployed unless you specified a IBM Storage Scale server at deployment or through the Connect to server operation for virtual application patterns and virtual system patterns.

The GPFS > .../GPFS Pattern/gpfs_logs > gpfs_pattern_install.log displays the following messages indicating that the shared service is not deployed to the cloud group:

[2015-02-26 14:22:23.243192] GPFSAgent - Retrieve Manager Info from shared service
[2015-02-26 14:22:23.705671] Failed to retrieve values from the IBM Shared Service for GPFS. Ensure that the IBM Shared Service for GPFS is deployed in the same cloud group with this deployment. If IBM Shared Service for GPFS is deployed, ensure that the input value is valid.

IBM Storage Scale portability failures are not reported promptly on Linux

During the IBM Storage Scale installation process, the build of the IBM Storage Scale portability layer might fail. You usually encounter IBM Storage Scale portability failures if the base image that is used to deploy your IBM Storage Scale instance does not have all of the required IBM Storage Scale dependencies.

This problem can occur when you are deploying a IBM Storage Scale Client, or IBM Storage Scale Primary configuration, or when attaching a IBM Storage Scale Mirror or Tiebreaker instance to a IBM Storage Scale Primary configuration.

When this failure occurs, the error is reported in the IBM Storage Scale logs but the execution is not aborted and the installation or add member operation continues, but will eventually fail because IBM Storage Scale was not configured properly (due to the portability layer build failure).

To identify any IBM Storage Scale portability failures, after you deploy your instance or after you add new members to the cluster, ensure that the cluster has been configured properly by running the Get Cluster Status operation and verify that all IBM Storage Scale nodes and NSD are reported to be up and running.

To help debug the problem and identify the root cause, open the IBM Storage Scale trace log (IWD trace.log for the GPFSMainServer role or GPFSClient role) and search for a Build GPFS portability FAILED message.

Primary instance remains in maintenance mode after an auto-revert operation

Problem: After a primary instance completes an auto-revert operation, it remains in maintenance mode.

Resolution: Manually resume the instance from the Instance management page to bring it to a Running state. You can then do the other IBM Storage Scale operations on that primary instance.

Some IBM Storage Scale operations show up in languages other than English

Symptom: When you use IBM Storage Scale, some operations show up in languages other than English.

Resolution: Set the locale to EN_US to make the operations show up in English language. Use the following commands for the IBM Storage Scale Server and manager instances of the IBM Storage Scale server cluster.

Check the language value with the following command.
```
bash-4.2# echo $LANG
```
Check locale on the instance with the following command.
```
bash-4.2# locale
```
Check environment on the instance with the following command.
```
bash-4.2# env |grep -e LANG -e LC
```
Change the locale.
Change the LANG value on the instance with the following command.
```
bash-4.2# export LANG=en_US.UTF-8
```
Change the LC_ALL value on the instance with the following command.
```
bash-4.2# export LC_ALL="en_US.UTF-8"
```
Check the locale.conf file on the instance with the following command. Ensure that the file must have an entry for the en_US locale.
```
bash-4.2# cat /etc/locale.conf
```
Modify the bash_profile name on the instance. Add the following statements at the end of the file to set and export the LANG value on the instances.
```
LANG=en_US.UTF-8
export LANG
```
Restart the instances.

Client key is not accepted on the IBM Cloud Pak System user interface when installing the IBM Storage Scale client on AIX 7.2

Symptom: On AIX 7.2, the client keys are generated as OPENSSH but the IBM Cloud Pak System user interface requires the RSA key.

Resolution: Retrieve the client key from the Manage > Operations page on the IBM Cloud Pak System user interface. Extract and convert it into an RSA key by using the following command:

ssh-keygen -p -m PEM -f <opensshkeyfile>

Provide that converted key file in the IBM Storage Scale Manager IP and Client Key field along with the IP address of the manager node.

GPFS service sometimes does not autostart after you restart the IBM Storage Scale client node

Symptom

After you restart the IBM Storage Scale client node, the GPFS service sometimes does not autostart.

Resolution

To address this issue, do these steps:

Log in to the IBM Storage Scale client virtual machine instance.
Go to the /usr/lpp/mmfs/bin/ location.
Run the following command:
```
mmstartup
```