
6027-4200 [E] Maximum number of retries reached
Explanation
The maximum number of retries to get a response from the quorum nodes have been reached due to a failure. Most of the IBM Storage Scale commands will not work because the IBM Storage Scale commands use the CCR. The following two sections describe the possible failure scenarios in detail, depending on the cluster configuration.This section applies to all clusters:
Explanation: The failure occurs due to missing or corrupted files in the CCR committed directory /var/mmfs/ccr/committed/. The missing or corrupted files might be caused by a hard or cold power off or a crash of the affected quorum nodes. Files in the committed directory might be truncated to zero-length.
User response:
- Verify any corrupted files in the committed directory by issuing mmccr check -Y
-e on every available quorum node. If the command responds with a message like the
following one, the directory contains corrupted files:
mmccr::0:1:::1:FC_COMMITTED_DIR:5:Files
in committed directory missing orcorrupted:1:7:WARNING
: - Follow the instructions in the topic Repair of cluster configuration information when no CCR backup is available.
This section applies only to clusters that are configured with tiebreaker disks:
Explanation: Files committed to the CCR, like the mmsdrfs file,
reside only on quorum nodes and not on tiebreaker disks. Using tiebreaker disks allows a cluster to
remain available even if only one quorum node is active. If a file commit to the CCR happens when
only one quorum node is active and the quorum node then fails, this error occurs on other nodes.
Possible reason for the failure:
- The quorum nodes that hold the most recent version of the file are not up.
- The IBM Storage Scale daemon (mmsdrserv or mmfsd) is not running on the quorum nodes that hold the most recent version of the file.
- The IBM Storage Scale daemon (mmsdrserv or mmfsd) is not reachable on the quorum nodes that hold the most recent version of the file.
User response
The most recent file version can be on any not available quorum node.- Try to start up as many quorum nodes as possible.
- Verify that either the mmsdrserv daemon (if IBM Storage Scale is down) or the mmfsd daemon (if IBM Storage Scale is active) is running on every quorum node (for example, on Linux® by issuing the ps command).
- Make sure the IBM Storage Scale daemons are reachable on all quorum nodes. To identify the problem, issue the mmhealth node show GPFS -v command or the mmnetverify command as described in the topic Analyze network problems with the mmnetverify command.
