Enabling SOAR disaster recovery

To enable disaster recovery (DR) on your IBM Security® QRadar® SOAR Platform system, you run an action on the appliance that you want to use as the primary system.

Before you begin

Before you enable DR for the first time, or after you upgrade, complete a system back up on the receiver host. After you enable DR, you cannot do a database restore on the receiver host to return it to its previous state. For more information about the different backup methods that are available, see Backup and restore the SOAR Platform.

Ensure that you specify the correct inventory file when you enable DR. After the action runs, the SOAR Platform uses the primary and secondary appliances as they are specified in the inventory file.

About this task

You can enable DR in either of two ways.
Enable DR in a single step
This method involves a single step, which runs the entire enable DR process, but requires more downtime of the system.
Enable DR in two steps
This method involves two steps: the first of which requires short system downtime. You can run the second step when the SOAR Platform is online.

Complete one of the following steps to enable DR.

Procedure

  1. Option 1: to enable DR in a single step, run:
    /usr/share/resilient-dr/ansible/scripts/run_dr.sh -a enable_dr -i <resilient_hosts_primary_machine_a.yml>
    This completes the entire enable DR process, but requires more downtime than option 2.
  2. Option 2: to enable DR in two steps:
    1. Run the following command to configure the master system, which requires short downtime:
      /usr/share/resilient-dr/ansible/scripts/run_dr.sh -a configure_master -i <resilient_hosts_primary_machine_a.yml>
    2. Run the following command as soon as the SOAR Platform is up and running to make sure that the receiver data is up to date:
      /usr/share/resilient-dr/ansible/scripts/run_dr.sh -a backup_data_to_receiver -i <resilient_hosts_primary_machine_a.yml>

Results

After DR is enabled, you can test the file synchronization and database streaming. On the primary system, log in as the resadmin user and check the resilient-filesync service status by running the following command:
sudo systemctl status resilient-filesync

If running correctly, the service is active and running the following process:

/usr/bin/lsyncd -nodaemon /usr/share/co3/conf/lsyncd/lsyncd.conf.lua

There are two log files for the resilient-filesync service, both of which are currently located in the /var/log/resilient-filesync/ directory. The first log is a general log file called resilient-filesync.log. The second log file, resilient-filesync.status.log, is a live status update of the actions being taken by the service, such as queued files to be pushed, monitored locations, and any blocked tasks. Both of these files are created on start-up of the resilient-filesync service.

On the receiver system, log in as resadmin and change user to resfilesync. Each of the synchronized directories should be present under /crypt/replication/, along with contents from the primary system. The postgres database on the receiver system should now be connected to the primary system and changes will be replicating from the primary to the receiver system. Check the status of the postgres service on the receiver using the following command:
systemctl status postgresql-14
You should see a receiver process streaming successfully. Errors are logged in the current date postgres log file on the receiver system's /var/lib/pgsql/14/data/log/ directory. You must change to either the root or postgres user to view the log files. The Resilient® service should be running on the primary appliance but disabled on the receiver appliance. You can check the status using the following command:
sudo systemctl status resilient
You can add an org, user, incidents and attachments on the primary instance through your browser. You should also see attachments being sent to the receiver system in the /crypt/replication/crypt/attachments directory.