Safe Cluster Restart Automation Guidelines

This document provides guidelines for automating the cluster restart procedure. This is useful when a simple restart of the cluster is needed.

Note that these guidelines apply to restarting a healthy cluster only, one where all servers in the cluster are up and there is only one active server per stripe.

Cluster restart automation

Cluster restart involves the following steps:
  • STEP 1. Shut down the clients.
  • STEP 2. Restart the passive servers in safe mode.
  • STEP 3. Restart the active servers in safe mode.
  • STEP 4. Make the previous active servers exit from safe mode.
  • STEP 5. Make the previous passive servers exit from safe mode.

The details for these steps are as follows:

  • STEP 1. Shut down the clients

    The Terracotta client will shut down when you shut down your application.

  • STEP 2. Restart the passive servers in safe mode

    Use the stop-tc-server script with the options --stop-if-passive and --restart-in-safe-mode to ensure that a server only restarts if it is in passive mode.

    Use a procedure indicated by the following pseudocode for shutting down all passive servers:

    for each <server> in <running servers> { 
       stop-tc-server --stop-if-passive --restart-in-safe-mode <server> <args> 
    }

    Wait for the passive servers to reach SAFE_MODE_STATE. The server state can determined by using the server-stat script.

    See the section Server Status (server-stat) in the Administration Guide for related information.

  • STEP 3. Restart the active servers in safe mode

    Use the stop-tc-server script with the options --stop-if-active and --restart-in-safe-mode. This restarts a server only if it is in active mode.

    Use a procedure indicated by the following pseudocode for shutting down all active servers:

    for each <server> in <running servers> { 
       stop-tc-server --stop-if-active --restart-in-safe-mode <server> <args>
    }

    Wait for all servers to reach SAFE_MODE_STATE. The server state can determined by using the server-stat script.

  • STEP 4. Make previous active servers exit from safe mode

    Use the exit-safe-mode script to make a server exit from safe mode.

    See the section Exit Safe Mode (exit-safe-mode) in the Administration Guide for related information.

    All previous active servers can be determined using the server-stat script. The server-stat script provides the state of a server prior to shutdown in the initialState field.

    Use a procedure indicated by the following pseudocode to make previous active servers exit from safe mode:

    <previous active servers> = []
    for each <server> in <servers> {
       server-stat -s <server>:<management-port>
       add <server> to <previous active servers> if the initialState is Active state
    }
    
    for each <server> in <previous active servers> {
       exit-safe-mode -s <server>:<management-port>
    }
  • STEP 5. Make previous passive servers exit from safe mode

    Use a procedure indicated by the following pseudocode to make previous passive servers exit from safe mode:

    for each <server> in <previous passive servers> {
       exit-safe-mode -s <server>:<management-port>
    }