Configuring failure detection in Cassandra

You must configure failure detection in Cassandra, to avoid temporary period of unavailability due to unexpected failures or network errors.

About this task

When a Cassandra node fails unexpectedly or a networking error causes a particular Cassandra node to become unreachable, you might observe a temporary period of unavailability. This period of unavailability results from Cassandra attempting to contact a node that it does not yet know has failed or is unreachable.

To shorten the period of unavailability, you must set appropriate value for the phi_convict_threshold property in the cassandra.yaml file. Configuring the property adjusts the sensitivity of the failure detector.

Default value for phi_convict_threshold is 8. Lower values increase the chance that an unresponsive node is marked as down, while higher values decrease the chance that transient failures cause a node failure. As per Cassandra documentation, a value lower than 5 or higher than 12 is not recommended.

To configure phi_convict_threshold property for failure detection in Cassandra:

Procedure

  1. Log in to the server where Cassandra node is installed.
  2. To stop the Cassandra node, go to <install_dir>/MailboxUtilities/bin, and type ./stopGMData.sh.
  3. Go to <install_dir>/apache-cassandra/conf directory.
  4. Open the cassandra.yaml file.
  5. Change value for the phi_convict_threshold property as required.
  6. Restart the Cassandra node for the change to take effect. Go to <install_dir>/MailboxUtilities/bin, and type ./startGMData.sh.
  7. Repeat the steps on all Cassandra nodes.