Troubleshooting a Spectrum Scale cloud
PowerVC manages the IBM Spectrum Scale cloud, and you should not have to interact with it. However, you can follow these procedures there is a problem with the cloud. You should not try to manually manage the cloud in other ways.
Problem: The cluster does not create volumes or a host loses its connection to the storage
If a cluster fails to create volumes or a host does not come back online in the cluster, check the state of the cluster and then run the mmstartup command if necessary.
Explanation:
If Spectrum Scale is inactive, it impacts the PowerVC server and the compute server.
- PowerVC server
- If Spectrum Scale is inactive on the PowerVC server, the cluster continues to operate successfully, however volume creation might fail.
- Compute server
- If a server losses its connection to Spectrum Scale, all of the virtual machines on that host lose their backing storage connection. Make sure that you have properly sized your server for the workload. If the NovaLink partition runs out of memory, it might stop the Spectrum Scale process.
Note: If a server is rebooted, Spectrum Scale
should automatically restart. However, if the network does not become active after a long period of
time, the startup process eventually fails. At that point, you must manually run the startup
command.
Resolution:
- Check the cluster state by running by running the following command on the PowerVC server:
/usr/lpp/mmfs/bin/mmgetstate -a
- If one of the servers is inactive, do the following:
- Verify that the server is on the network.
- Make sure that the server is responding.
- Verify that your server has enough memory and processing capacity.
- Start Spectrum Scale on the inactive node by
running the startup
command:
/usr/lpp/mmfs/bin/mmstartup -N <node>
- You might need to restart Cinder services after the node becomes active by running this command:
/opt/ibm/powervc/bin/powervc-services cinder restart
Problem: A disk dies in the cluster
Unfortunately, disks eventually wear out. If you are using SAN-backed storage, your SAN should notify you of this and handle the replication. However, if you are using local disks, you need to work with Spectrum Scale processes to fix the problem.
Resolution:
Follow these steps in the Spectrum Scale Knowledge
Center: Disk Failures
Important: Do not follow the steps for
Stopping the disk failure auto recovery operation. PowerVC assumes that auto recovery is always running.