CTDB issues

CTDB is a database layer for managing SMB and Active Directory specific information and provides it consistently across all CES nodes.

CTDB requires network connections to TCP port 4379 between all CES nodes. Internally, CTDB elects a recovery master among all available CTDB nodes. The elected node then acquires a lock on a recovery lock file in the CES shared root file system to ensure that no other CES node tries to do the same in a network problem. The usage of the CTDB recovery lock is introduced with IBM Storage Scale 5.0.5.

If there is a problem with SMB or Active Directory integration or a specific CTDB problem is reported in the health check, the following steps must be taken:

  1. Check the status of CTDB on all CES nodes:
    /usr/lpp/mmfs/bin/mmdsh -N CesNodes -f1 /usr/lpp/mmfs/bin/ctdb status

    If a status is reported as DISCONNECTED, ensure that all the CES nodes are up and running and network connections to TCP port 4379 are allowed.

    If a status is reported as BANNED check the logs files.

  2. Check the CTDB log files on all nodes:
    CTDB logs in to the standard syslog. The default syslog file name varies among the Linux® distributions, for example:
    /var/log/messages
    /var/log/syslog 

    or the journalctl command must be used to show the system messages.

    This message sequence indicates that a node might not acquire the recovery lock:

    ctdb-recoverd[28458]: Unable to take recovery lock - contention
    ctdb-recoverd[28458]: Unable to take recovery lock
    ctdb-recoverd[28458]: Abort recovery, ban this node for 300 seconds
    ctdb-recoverd[28458]: Banning node 3 for 300 seconds

    This usually indicates a communication problem between CTDB on different CES nodes. Check the node local firewall settings, any network firewalls, and routing to ensure that connections to TCP port 4379 are possible between the CES nodes.