Cassandra fails to start

Cassandra fails to start during the installation of Netcool® Operations Insight® 1.6.3.2 on Red Hat® OpenShift®.

Problem

You might have an issue with Cassandra not starting successfully during the installation of Netcool Operations Insight, node drains, cluster restarts, or Red Hat OpenShift upgrades.

Cause

  1. If Cassandra fails to start, it causes the startup of the topology pods to also fail. When you use the following command to check the logs,
    oc logs -f <NOI RELEASE NAME>-cassandra-0
    the following error is displayed in the Cassandra pod:
    ERROR [BatchlogTasks:1] 2021-11-25 22:06:17,767 DefaultFSErrorHandler.java:66 - Stopping transports as disk_failure_policy is stop
    ERROR [HintsWriteExecutor:1] 2021-11-25 22:06:17,767 DefaultFSErrorHandler.java:66 - Stopping transports as disk_failure_policy is stop
    ERROR [BatchlogTasks:1] 2021-11-25 22:06:17,768 CassandraDaemon.java:244 - Exception in thread Thread[BatchlogTasks:1,5,main]
    java.lang.RuntimeException: java.util.concurrent.ExecutionException: FSWriteError in /opt/ibm/cassandra/bin/../data/hints
  2. Check the Cassandra pod by logging in to the pod with the following command:
    oc exec -ti <NOI RELEASE NAME>-cassandra-0 bash
    The command displays the status of the Cassandra pod as shown:
    bash-4.4$ nodetool status
    Datacenter: datacenter1
    =======================
    Status=Up/Down
    |/ State=Normal/Leaving/Joining/Moving
    --  Address       Load       Tokens       Owns    Host ID                               Rack
    UN  10.254.16.28  144.95 MiB  256          ?       31dd156c-057b-4d1f-83a9-729a63db0f4f  rack1
    
    bash-4.4$ /opt/ibm/cassandra/bin/cqlsh --u hdm --password <cassandra password>
    Connection error: ('Unable to connect to any servers', {'127.0.0.1': error(111, "Tried connecting to [('127.0.0.1', 9042)]. Last error: Connection refused")})
    Note: For a geo-redundant deployment, --ssl is enabled:
    bash-4.4$ /opt/ibm/cassandra/bin/cqlsh --u hdm --password <cassandra password> --ssl

    The status shows a connection error in the Cassandra pod.

Resolution

Following are the steps to resolve the issue with Cassandra not starting successfully:

  1. Delete the Cassandra pod by running the following command:
    oc delete pod <NOI RELEASE NAME>-cassandra-0
  2. Check whether the Cassandra starts successfully.