Installing a Cassandra node

Cassandra can be installed only on a Linux® platform.

Before you install a Cassandra node, complete the following steps:
  • Complete your topology and installation planning and keep it ready for reference. For more information on topology planning, see Planning the topology and Planning for installation.
  • Install the appropriate JDK version. IBM JDK 8 is required for Global Mailbox nodes running Cassandra 3.11.
    Tip: If you are using IBM® JDK, you might see the following error in Cassandra log. You can ignore the error, as it does not impact Cassandra operations.

    /<Install_Dir>/apache-cassandra/bin/../conf/cassandra-env.sh: line 218: [: 1.8.0: integer expression expected

  • Download Sterling B2B Integrator V6.0 media package from IBM Fix Central or IBM Passport Advantage.
  • Extract the package to a folder, go to the media directory and locate the following files:
    • IM_Linux.zip in the InstallationManager folder
    • Common_Repo.zip
  • Extract the files to a common directory. After you extract the files, the directory must have the following subdirectories:
    • IM_Linux
    • b2birepo
    • gmrepo
  • Default port numbers are assigned to various ports that the Cassandra utilities use. If you do not intend to use the default port numbers, keep the custom port numbers handy. You must enter the port numbers when defining the Cassandra nodes for Cassandra topology.
  • Install an appropriate windowing system to view the IBM Installation Manager user interface on the Linux system.
  • Set the shell LANG variable as appropriate and export the variable. For example in sh, ksh or bash:
    LANG=en_US; export LANG
    For example in csh:
    setenv LANG en_US
    The LANG environment variable must be the same across all Cassandra and ZooKeeper nodes.

You must define all the Cassandra nodes that are required in all your data centers when you are installing each Cassandra node. However, you must install each node separately on the required computer.

Attention: You can install Cassandra and ZooKeeper on a single node at the same time, for which you must choose both Global Mailbox Cassandra Node and Global Mailbox Zookeeper Node options. However, you must not install multiple instances of Cassandra and ZooKeeper on the same server.

To install a single Cassandra node with the user interface on a Linux system, complete the following steps:

  1. Open a command prompt and complete one of the following tasks to start the Installation Manager:
    1. Go to the IM_Linux directory and type ./userinst for the following scenarios:
      • If you have not installed the Installation Manager.
      • If you have a 64-bit Installation Manager installed.
      • If you have the Installation Manager installed on a platform that has only one download available for Installation Manager.
    2. Go to <installation directory>/Installation Manager/eclipse and type ./IBMIM, if you have a 32-bit Installation Manager installed on a Linux system.
    Tip: Record a response file when installing the first Cassandra node, so that the response file can be used to install other nodes through silent installation mode. For more information, see Recording a response file and Installing or updating with a response file.
  2. On the Installation Manager home page, click Install.
    Important: If IM_Linux, gmrepo, and b2birepo directories are not in the same directory or if you already have Installation Manager installed, a message is displayed that there are no packages to install or that Installation Manager cannot connect to the repositories. You must add the Global Mailbox and Sterling B2B Integrator repository files to the Installation Manager repository. For more information about adding repository files, see Repository preferences.
  3. On the Install Packages screen, select the Global Mailbox check box. This action selects the Version 6.0 check box also. Click Next.
  4. Click Browse and specify the installation directory for the Global Mailbox. This directory is specific to the Global Mailbox components. You can choose to retain the default or change it. Click Next.
    Important: If you are running IBM Installation Manager for the first time, you must specify the shared resource directory for the Installation Manager. Click Browse and specify the shared resources directory and click Next.
  5. Select the translation (language) to install and click Next.
  6. Select the Global Mailbox Prerequisite Installation check box and select Global Mailbox Cassandra Node. Click Next.
    Important: To install the Zookeeper node along with the Cassandra node, select Global Mailbox Zookeeper Node also.
  7. Specify the Java™ home path for the data store node. Click Browse, navigate to the location of the Java folder, select it, and click Next.
  8. Complete the following steps to define the topology for Cassandra nodes and install the local Cassandra node:
    1. Type the name of the data center where you are defining the Cassandra node in the Data Center Name field and click Add. Repeat the step to add other data centers.
      To remove a data center, select the data center from the list and click Remove.
    2. Click Continue to Step 2.
    3. Select the data center where you are defining the Cassandra node and type a name for the rack. Repeat the step to add other racks.
      To remove a rack, select the data center and rack combination from the list and click Remove.
    4. Click Continue to Step 3.
    5. Select the appropriate data center and rack from the list, and in the Cassandra Host Name field, type the IP address or host name of the machine where you need to install Cassandra. Repeat the step for other Cassandra nodes.
    6. Optional: If the specific node must be a seed node, select the Seed Node check box.
    7. Click Add.
      To remove the node, select the data center, rack, and host name combination from the list and click Remove.
    8. Click Continue to Step 4.
    9. From the defined nodes, select the Cassandra node that must be installed on the local system. Select the data center, rack, and host name combination and type the following details for the node:
      Important: It is suggested to use the default ports. If you are not using the default ports, the Cassandra utilities that use the ports must be started with a command line argument to provide the correct port number. Regardless of whether you are using default ports or providing custom values, the ports must be same for each port type in every node in the cluster. For example, if you are using 9043 as the client port, use the same client port for all Cassandra nodes in the cluster.
      Inter-node Communication Port
      The port for inter-node communication. Default is 7000.
      Inter-node Communication SSL Port
      The SSL port for encrypted communication. The port is not used if SSL is not enabled in the cassandra.yaml configuration file (server_encryption_options). Default is 7001.
      Client Port
      The default transport port, which is used by Sterling B2B Integrator to connect to Cassandra. Default is 9042.
      RPC Port
      The port for the thrift RPC service, which is used for client connections. The cqlsh client uses the RPC port to connect to Cassandra. Default is 9160.
      Reaper HTTP Port
      The port for the Reaper service is 7080. To add a cluster to the Reaper database, use the following command: ./spreaper add-cluster <IP address>. If you specify a different port number, use the following command to register the cluster: ./spreaper add-cluster <IP address> --reaper-port=<port number given at install time>.
      JMX Port
      The Java management port for the Cassandra instance. The node tool command that is installed with Cassandra uses the JMX port. Default is 7199.
    10. Select a data directory for the Cassandra node.
      Ensure the directory for ZooKeeper node and Cassandra node are not on the same physical I/O device. Co-locating the directories might lead to performance issues.
  9. Click Install.
  10. Review the installation summary. Notice the installation progress in the Repository Information section.
  11. Click Finish to complete the installation.
    Important: Ignore the datastax warning messages in the Installation Manager log files.
  12. Start the Cassandra node by running the ./startGM.sh command from <installDir>/MailboxUtilities/bin/.

    You must start the Cassandra seed node first so that Reaper is started successfully. When you run the ./startGM.sh command, all components are automatically started in the following order:

    1. Cassandra
    2. Reaper
    3. ZooKeeper
    4. Watchdog
    Tip: If you want to start Cassandra individually, you can run ./startGMData.sh. However, you must ensure that you start Reaper on the seed node by running the ./startGMDataReaper.sh command after Cassandra is started.
    Note: Before you start the other Reaper instances in the cluster, ensure that the Reaper UI of the first instance is up. If you start the other instance before the UI of the first instance is running, errors that are related to schema creation might occur.
  13. When you start a Cassandra node, wait several minutes for Cassandra to become operational. Verify that the process is complete before proceeding. Go to the /<Install_Dir>/apache-cassandra/bin/ directory and type the ./nodetool status command.
    When the status for the node is UN, then the node is up and additional nodes can be started. If the status for a node shows as DN, then that particular node is down. Run ./nodetool status again in a few minutes until the node is in UN status.
    Tip: To run nodetool, JAVA_HOME must be set to the location of IBM JDK 8.

    For example, there are two data centers, vm1 and vm2. Initially both vm1 and vm2 are not running Cassandra. Log in to vm2 and start Cassandra. After waiting a minute or two for it to set up, run the nodetool. The following output is returned:

    Data center: vm2

    ===============
    Status=Up/Down
    |/ State=Normal/Leaving/Joining/Moving
    -- Address Load Tokens Owns Host ID Rack
    UN X.XX.XXX.2 508.2 KB 256 47.5% 7bbb2dc4 r1.vm2

    Data center: vm1

    ===============
    Status=Up/Down
    |/ State=Normal/Leaving/Joining/Moving
    -- Address Load Tokens Owns Host ID Rack
    DN X.XX.XXX.1 ? 256 52.5% ea769c93 r1.vm1

    In this example, the status for vm2 is UN which means Cassandra is fully loaded and is ready to go. It took about 1-2 minutes after running startGM.sh for the data center status to become UN. You can now log in to vm1 and start Cassandra on that system. Notice that the status for vm1 is DN because Cassandra is not yet started.

  14. Repeat the procedure to install additional Cassandra nodes in the current and other data centers.
    Important: When adding or starting multiple Cassandra nodes, always wait at least two minutes between nodes to prevent possible errors. If two or more nodes are added to a cluster too quickly, an attempt to add a node can occur before updates to the Cassandra cluster topology are complete. If this occurs, Cassandra cannot be initialized on the new node, which causes an error similar to the following message:
    Exception encountered during startup: Other bootstrapping/leaving/moving nodes detected,
    cannot bootstrap while cassandra.consistent.rangemovement is true. 
  15. Change the maximum map count setting for Cassandra.
    1. Log in to each Cassandra node as the root administrator.
    2. Edit the /etc/sysctl.conf file to add the following lines:
          # Recommended Production Setting for Cassandra
          vm.max_map_count = 131072
    3. Save the file.
    4. Run the following command:
      sysctl -p
    5. Restart each Cassandra node for the setting to enable the revised setting.

Cassandra is sensitive to time stamps in synchronizing operations across nodes, due to which the time on systems where Cassandra nodes are installed must be kept in tight synchronization. It is important to sync time among Cassandra node systems versus the absolute time. For more information, see Synchronizing time across Cassandra nodes.