CTDB issues
CTDB is a database layer for managing SMB and Active Directory specific information and provides it consistently across all CES nodes.
CTDB requires network connections to TCP port 4379 between all CES nodes. Internally, CTDB elects a recovery master among all available CTDB nodes. The elected node then acquires a lock on a recovery lock file in the CES shared root file system to ensure that no other CES node tries to do the same in a network problem. The usage of the CTDB recovery lock is introduced with IBM Storage Scale 5.0.5.
If there is a problem with SMB or Active Directory integration or a specific CTDB problem is reported in the health check, the following steps must be taken:
- Check the status of CTDB on all CES
nodes:
/usr/lpp/mmfs/bin/mmdsh -N CesNodes -f1 /usr/lpp/mmfs/bin/ctdb status
If a status is reported as DISCONNECTED, ensure that all the CES nodes are up and running and network connections to TCP port 4379 are allowed.
If a status is reported as BANNED check the logs files.
- Check the CTDB log files on all nodes:CTDB logs in to the standard syslog. The default syslog file name varies among the Linux® distributions, for example:
/var/log/messages
/var/log/syslog
or the journalctl command must be used to show the system messages.
This message sequence indicates that a node might not acquire the recovery lock:
ctdb-recoverd[28458]: Unable to take recovery lock - contention ctdb-recoverd[28458]: Unable to take recovery lock ctdb-recoverd[28458]: Abort recovery, ban this node for 300 seconds ctdb-recoverd[28458]: Banning node 3 for 300 seconds
This usually indicates a communication problem between CTDB on different CES nodes. Check the node local firewall settings, any network firewalls, and routing to ensure that connections to TCP port 4379 are possible between the CES nodes.