ZooKeeper continuously restarts on one node

This topic describes what to do if a ZooKeeper, on one of the Kafka message queue nodes, continuously restarts due to snapshot corruption.

Error description:
The ZooKeepers keep the transaction history in logs. When the size of the transaction history grows large enough, the log snapshot is taken and stored. When the ZooKeeper is started, it loads the snapshot database to determine the state of the ZooKeeper quorum. If the ZooKeeper is unable to load the snapshot database, then that is considered snapshot corruption. This situation is rare. If the problem affects multiple ZooKeeper nodes and the ZooKeeper quorum cannot be established, contact the next level of support.
Resolution:
  1. Log in as the root user to the IBM Spectrum Scale node with the failing ZooKeeper.
  2. Change the directory to the ZooKeeper log directory:
    cd /opt/kafka/zookeeper/version-2
  3. Remove all of the contents from the directory:
    rm -f *
  4. Restart the ZooKeeper process (when the ZooKeeper restarts, it rebuilds the logs and snapshots from the other ZooKeepers in the quorum):
    systemctl start zookeeper