Cassandra cannot satisfy consistency level

The Cassandra cluster might not be able to satisfy the configured consistency level because the replication factor is incorrectly configured on one or more nodes. You can recover from this situation by diagnosing the problem with the messages.log file and verifying that the replication factor is correctly configured for all key spaces on each Cassandra node.

Symptoms

Any action in the Global Mailbox management tool that involves mailboxes or messages results in the following message:

A system error has occurred. Please contact your system administrator.

Causes

Cassandra might not be able to satisfy the configured level of consistency for the following list of reasons:
Replication configuration issue
The replication properties are incorrectly configured for one or more Cassandra nodes.
Cannot maintain quorum
An issue with Cassandra prevents queries from satisfying EACH_QUORUM.
Restriction: It may be possible to satisfy LOCAL_QUORUM on the same node. An event is raised because there is an issue with consistency in general for this node.
Network connectivity issue
Network problems are preventing Cassandra from communicating with enough nodes to successfully run a query.

Environment

Windows and Linux®.

Diagnosing the problem

If you suspect that Cassandra is unable to satisfy the specified consistency level, you can search the messages.log file for the appropriate error message at the time of the failure:
  1. Go to the <install_directory>/usr/servers/defaultServer/logs directory.
  2. Open the messages.log file.
  3. Examine the log for events with the error ID CBXMD0040E that are followed by an event with the following message:
    W Could not execute query successfully due to lack of required 
    Cassandra replicas. '{0}' replicas were required
    to successfully execute a query using a consistency level of '{1}' within 
    keyspace '{2}', but only '{3}' replicas could be contacted.
    The following list includes the information that is provided by the'{0}', '{1}', '{2}', and '{3}' variables:
    • '{0}' indicates the configured number of Cassandra replicas that are required to successfully run a query.
    • '{1}' indicates the configured consistency level.
    • '{2}' identifies the specific keyspace that contains the configuration information that is used by the failed query. Keyspaces contain the replication configuration information that is used by each type of query that is performed on the Global Mailbox system. Replication settings must be correctly configured for the following list of keyspaces:
      • scheduler
      • mailbox
      • event
      • replication
      • gatekeeper
    • '{3}' identifies the number of replicas that were successfully contacted for the query.
    The format of the error message is as follows:
    [time stamp] [thread ID] [logging class] [logging level] [error ID]:
     [error message]
    The following example events include the information that is logged by the messages.log file:
    [mm/dd/yy hh:mm:ss:ms PDT] 00000063 
    com.ibm.mailbox.database.dao.cassandra.CassandraDAO  
    E CBXMD0040E: An error has occurred while trying to connect to 
    Cassandra.
    [mm/dd/yy hh:mm:ss:ms PDT] 00000063 
    com.ibm.mailbox.database.dao.cassandra.CassandraDAO 
    W Could not execute query successfully due to lack of required Cassandra replicas. 
    3 replicas were required to successfully execute a query using a consistency level of 
    'ALL' within keyspace 'UNDEFINED', but only 2 replicas could be contacted.

Resolving the problem

If any of the events in the messages.log file indicate that Cassandra cannot satisfy the specified consistency level, verify that your Cassandra cluster is correctly configured:
  1. Collect information that defines the topology of your Cassandra cluster deployment:
    Tip: If you do not have records that specify your Cassandra cluster topology, you can use the nodetool program to determine the number of Cassandra nodes in your Global Mailbox system:
    1. To run nodetool, JAVA_HOME must be set to the location of IBM JDK 8.
    2. From the command line, run bin/nodetool.

      The output of the nodetool command is represented in the following example:

      Datacenter: datacenter1
      =======================
      Status=Up/Down
      |/ State=Normal/Leaving/Joining/Moving
      --  Address      Load       Tokens  Owns (effective)  Host ID                               Rack
      UN  9.23.16.186  55.68 KB   256     49.1%             86282e2e-e4a2-4643-a077-0ca6ea32e138  rac1
      UN  9.23.16.184  41.22 KB   256     47.9%             5ca91d43-9154-4b22-b1bb-4b432d0bdf43  rac1
      Datacenter: datacenter2
      =======================
      Status=Up/Down
      |/ State=Normal/Leaving/Joining/Moving
      --  Address      Load       Tokens  Owns (effective)  Host ID                               Rack
      UN  9.23.25.148  45.19 KB   256     49.2%             215a6b13-fcc2-47ce-bf4d-cd7bfa8fc52c  rac1
      UN  9.23.16.187  71.97 KB   256     53.7%             db01c093-96a8-4ee3-8a42-ede9a4ef41b5  rac1
    3. To determine the total number of Cassandra nodes that are in your cluster, check the output for Address. The Address output provides the IP address of each Cassandra node in the cluster.
    4. Record the number of Cassandra nodes that each data center contains.
  2. Verify that the replication factor value is correctly configured for each Cassandra node. You can use a Cassandra Query Language (CQL) shell to identify the replication configuration for each keyspace for each node.
    1. From the command line, type cqlsh> SELECT * FROM system_schema.keyspaces;
      The following example is an output of the CQL shell query:
      keyspace_name     durable_writes   strategy_class                                             strategy_options
      scheduler         True             org.apache.cassandra.locator.NetworkTopologyStrategy       {“datacenter1”:”2”,”datacenter2”:”2”}
      mailbox           True             org.apache.cassandra.locator.NetworkTopologyStrategy       {“datacenter1”:”2”,”datacenter2”:”2”}
      event             True             org.apache.cassandra.locator.NetworkTopologyStrategy       {“datacenter1”:”2”,”datacenter2”:”2”}
      replication       True             org.apache.cassandra.locator.NetworkTopologyStrategy       {“datacenter1”:”2”,”datacenter2”:”2”}
      system            True             org.apache.cassandra.locator.LocalStrategy                 {}
      system_traces     True             org.apache.cassandra.locator.SimpleStrategy                {“replication_factor”:”2”}
      gatekeeper        True             org.apache.cassandra.locator.NetworkTopologyStrategy       {“datacenter1”:”2”,”datacenter2”:”2”}
      
    2. Ensure that replication factor is correctly configured for the following keyspaces:
      • scheduler
      • mailbox
      • event
      • replication
      • gatekeeper
      The replication factor is specified for each data center, and is the total number of Cassandra nodes in the data center. For example, in a two-data center Global Mailbox system, if data center 1 contains 2 Cassandra nodes, and data center 2 contains 3 Cassandra nodes:
      • For data center 1, the replication factor is 2
      • For data center 2, the replication factor is 3
    3. Optional: If the replication factor value is incorrect for a keyspace, you can update the replication factor configuration:
      1. From the command line, type ./stopGM.sh to stop all nodes that are running the Global Mailbox application.
      2. Run the ALTER KEYSPACE command to update the replication factor configuration for a keyspace. Provide the correct replication factor value for a data center, or data centers that are not correctly configured. The following example shows the correct syntax that is required to update the configuration of datacenter 1 example with the ALTER KEYSPACE command:
        cqlsh> ALTER KEYSPACE scheduler WITH replication = {'class': 'NetworkTopologyStrategy', 'datacenter1 example' : 2};
        Important: Run the ALTER KEYSPACE command for each keyspace that is incorrectly configured.
      3. Run the CQL shell on each Cassandra node in your Global Mailbox system to update the keyspaces that are incorrectly configured. The keyspaces on each Cassandra node must be configured with the correct replication factor.
      4. Run the nodetool repair command on one online Cassandra node after all Cassandra nodes are correctly configured. To run nodetool, JAVA_HOME must be set to the location of IBM JDK 8.
      5. Type ./startGM.sh to start all nodes that are running the Global Mailbox application.
  3. If your Cassandra cluster is correctly configured, check the status of the network for your Global Mailbox system.