CTDB issues

CTDB is a database layer for managing SMB and Active Directory specific information and provides it consistently across all CES nodes.

CTDB requires network connections to TCP port 4379 between all CES nodes. Internally, CTDB elects a recovery master among all available CTDB nodes. The elected node then acquires a lock on a recovery lock file in the CES shared root file system to ensure that no other CES node tries to do the same in a network problem. The usage of the CTDB recovery lock is introduced with IBM Storage Scale 5.0.5.

If there is a problem with SMB or Active Directory integration or a specific CTDB problem is reported in the health check, the following steps must be taken:

Check the status of CTDB on all CES nodes:
```
/usr/lpp/mmfs/bin/mmdsh -N CesNodes -f1 /usr/lpp/mmfs/bin/ctdb status
```
If a status is reported as DISCONNECTED, ensure that all the CES nodes are up and running and network connections to TCP port 4379 are allowed.

If a status is reported as BANNED check the logs files.
Check the CTDB log files on all nodes:
CTDB logs in to the standard syslog. The default syslog file name varies among the Linux® distributions, for example:
```
/var/log/messages
```
```
/var/log/syslog 
```
or the journalctl command must be used to show the system messages.

This message sequence indicates that a node might not acquire the recovery lock:
```
ctdb-recoverd[28458]: Unable to take recovery lock - contention
ctdb-recoverd[28458]: Unable to take recovery lock
ctdb-recoverd[28458]: Abort recovery, ban this node for 300 seconds
ctdb-recoverd[28458]: Banning node 3 for 300 seconds
```
This usually indicates a communication problem between CTDB on different CES nodes. Check the node local firewall settings, any network firewalls, and routing to ensure that connections to TCP port 4379 are possible between the CES nodes.