Backing up and restoring a Liberty collective

Before changing a collective controller, member or replica set, it is recommended to store a copy of the collective server files so that you can restore the collective, if needed, in the future. You might restore a backed-up copy to fix corrupted files or to downgrade the collective to a previous configuration. When restoring a replica set, it is important to start restored replicas one at a time to synchronize data.

Procedure

  • Back up a collective.

    To back up a collective controller, member or replica set, copy the server files that you want to preserve to a safe location. The server files are in the $WLP_USER_DIR/servers/server_name directory and its subdirectories. To avoid locking problems when copying the files, stop the server before backing up its files.

  • Restore a collective controller or member.

    To restore a backed-up collective controller or member server, you can use files in the backed-up directory to configure a new server or, if the Liberty version is the same, copy the backed-up files into the Liberty installation. You do not need to copy the logs and workarea directories into the target installation. Ensure the server.xml file of a restored server sets the correct host value and has unique port values in the target installation.

    Optionally, start the restored server with the --clean parameter to clean cached server information:

    server start server_name --clean

    You use the --clean option only once; all subsequent server starts do not require it.

  • Restore or downgrade a replica set.
    1. Stop all replicas in the replica set.
    2. Restore the frappe database (fdb) directory on one replica from a backup.

      Replace the frappe database of the replica that you want to restore with the frappe database of the backed-up replica. The frappe database is the $WLP_USER_DIR/servers/collective_controller_name/resources/collective/repository/fdb directory of the replica.

    3. Delete the fdb directory contents of all other replicas in the replica set.
    4. If you want to downgrade to a previous version, replace the Java archive (JAR) files on all replicas with older versions of the JAR files to roll back the replicas to a previous version.
    5. Start the replica with the restored fdb directory.
    6. Look for the CWWKX6012I message in the messages for the restored replica.

      To see the replica messages, open an editor on the $WLP_USER_DIR/servers/collective_controller_name/logs/messages.log file. The message resembles:

      CWWKX6012I: The collective controller is temporarily unavailable, probably due to a change in the replica set. It should become available within a few seconds. Current active replica set is [active_replicas]. The configured replica set is [configured_replicas].
    7. Start only a minimal majority of the replicas.

      Include the restored replica in the number of minimal majority replicas. For example, if a replica set has 5 replicas, a minimal majority is 3 replicas. Because the restored replica is already running, you would start 2 replicas in step 7a.

      1. Start a minimal majority of the replicas.
      2. After these replicas are running, look for the CWWKX6011I message in the messages of each replica that you started. The message resembles:
        CWWKX6011I: The collective controller is ready, and can accept requests. The leader is replicaHost:replicaPort. Current active replica set is [active_replicas]. The configured replica set is [configured_replicas].

        The CWWKX6011I message indicates that the replicas synchronized correctly, with the restored fdb directory fully replicated in the majority replicas.

        Verify that the [active_replicas] section in the message lists all the majority replicas that you have started. Also verify that the [configured_replicas] section lists all the replicas in the replica set, including replicas that you have not started yet.

    8. Start all remaining replicas.
      1. Start the remaining replicas.
      2. After the replicas are running, look for the CWWKX6011I message in the messages of each replica.

        Verify that the [active_replicas] and [configured_replicas] sections in the message list all the replicas in the replica set.

    The replicas are now running on the restored version.