Overview of SOAR disaster recovery

The Disaster Recovery (DR) system provides the ability to re-establish an operational SOAR system if a disaster occurs. The DR system consists of two SOAR appliances, each of which is running the same version of the SOAR Platform. It is not a high availability (HA) system.

The terms master and receiver are used throughout this guide for the states of the appliance systems. The master system is the appliance on which the active SOAR Platform instance is running. The receiver system is the appliance that is ready to take over running the SOAR Platform in a disaster scenario. In addition, machine_a and machine_b are used for system names in the examples. The states of these machines can change, for example, machine_a can be the master appliance and subsequently become the receiver, but the machine names remain constant.

The DR system uses the following technologies:
  • Postgres base backup is used to make the first copy of the database.
  • Postgres streaming replication is used to send ongoing changes.
  • lsyncd (in rsync+ssh mode) is used to send copies of files. The files are held in a shadow location on the second appliance (receiver) and are moved to their normal location when the receiver is promoted to master.
  • Ansible® is used to enable the appliances to the required state.
Note: The opensearch index is not copied at any time. The index is built on the second appliance when the receiver is promoted to master. This is to ensure consistency of the data in the index and the database.

Deployment overview

The process to set up, deploy, and use the DR system is as follows:
  1. Verify the prerequisites, described in SOAR disaster recovery prerequisites.
  2. Install the DR content on each of the appliances, described in Step 1: Installing DR.
  3. Set up the DR content on each of the appliances, described in Step 2: Setting up the appliance systems.
  4. Configure postgres for SSL using either the supply or manual method, described in Step 3: Configuring Postgres for SSL.
  5. Create Ansible inventory files on each of the appliances, described in Step 4: Creating Ansible inventory files.
  6. Create Ansible vault files on each of the appliances, described in Step 5: Creating Ansible vault files.
  7. Run the DR actions to enable and use DR, described in Running Disaster Recovery actions.
  8. Optionally, configure the health monitor, described in Using the health monitor.

Audience

This guide is intended for system administrators who are responsible for maintaining the SOAR Platform. You need knowledge of UNIX and IT administration to install and set up the Disaster Recovery system.

Security considerations

It is important to encrypt the SSH key vault file and SSL certs and to follow the security recommendations described in this guide.