ZooKeeper can not be connected error
ZooKeeper troubleshooting 1:
After replacing nodes or adding nodes, ZooKeeper server can not be connected from cinder and gnocchi client.
Some error logs show as below:
In cinder-api.log:
CRITICAL cinder [-] Unhandled error: tooz.coordination.ToozConnectionError: Operational error: Connection time-out
ERROR cinder Traceback (most recent call last):
ERROR cinder File "/usr/lib/python3.6/site-packages/tooz/drivers/zookeeper.py", line 150, in _start
ERROR cinder self._coord.start(timeout=self.timeout)
ERROR cinder File "/usr/lib/python3.6/site-packages/kazoo/client.py", line 635, in start
ERROR cinder raise self.handler.timeout_exception("Connection time-out")
ERROR cinder kazoo.handlers.threading.KazooTimeoutError: Connection time-out
In gnocchi metricd log
WARNING tooz.coordination: Retrying tooz.drivers.zookeeper.KazooDriver.heartbeat in 1.0 seconds as it raised Connection has been closed.
ERROR gnocchi.cli.metricd: Unexpected error updating the task partitioner: Connection has been closed
ERROR gnocchi.cli.metricd: Unexpected error during processing job
In zookeeper log
[myid:] - ERROR [SyncThread:1:o.a.z.s.ZooKeeperCriticalThread@49] - Severe unrecoverable error, from thread : SyncThread:1
java.lang.NullPointerException: null
at org.apache.zookeeper.server.quorum.SendAckRequestProcessor.flush(SendAckRequestProcessor.java:67)
at org.apache.zookeeper.server.SyncRequestProcessor.flush(SyncRequestProcessor.java:248)
at org.apache.zookeeper.server.SyncRequestProcessor.run(SyncRequestProcessor.java:169)
[myid:] - ERROR [LearnerHandler-/172.26.3.219:55954:o.a.z.s.q.LearnerHandler@719] - Unexpected exception in LearnerHandler:
java.io.EOFException: null
at java.io.DataInputStream.readInt(DataInputStream.java:392)
at org.apache.jute.BinaryInputArchive.readInt(BinaryInputArchive.java:96)
at org.apache.zookeeper.server.quorum.QuorumPacket.deserialize(QuorumPacket.java:86)
at org.apache.jute.BinaryInputArchive.readRecord(BinaryInputArchive.java:134)
at org.apache.zookeeper.server.quorum.LearnerHandler.run(LearnerHandler.java:541)
Explanation 1
Server not coming up because of file corruption : A server might not be able to read its database and fail to come up because of some file corruption in the transaction logs of the ZooKeeper server. You will see some IOException errors in ZooKeeper logfile.
Resolution 1
In such a case,delete all the files in /var/lib/zookeeper/version-2 on three nodes respectively.
cd /var/lib/zookeeper
mv version-2 version-2.bak
mkdir version-2
chown zookeeper:zookeeper -R version-2
icic-services restart
ZooKeeper troubleshooting 2:
In zookeeper log:
java.io.IOException: Leaders epoch, 3 is less than accepted epoch, 63
at org.apache.zookeeper.server.quorum.Learner.registerWithLeader(Learner.java:525)
at org.apache.zookeeper.server.quorum.Follower.followLeader(Follower.java:91)
at org.apache.zookeeper.server.quorum.QuorumPeer.run(QuorumPeer.java:1539)
Resolution 2
Delete acceptedEpoch and currentEpoch file under /var/lib/zookeeper/version-2.
-rw-r--r--. 1 zookeeper zookeeper 1 May 22 21:17 acceptedEpoch
-rw-r--r--. 1 zookeeper zookeeper 1 May 22 21:17 currentEpoch
Then icic-services restart.