Get the Most Out of IBM Cloudant with Cross-Region Replication

1 min read

How to achieve full two-way replication between two data centre regions with IBM Cloudant.

If you run applications with customers in multiple regions of the world or run apps that are required to be resilient to the failure of whole data centre regions, you should consider IBM Cloudant for your data store.

IBM Cloudant has unique cross-region replication capabilities that allow you to maintain identical datasets that are always in sync in different parts of the world. That way, your users can be served faster by retrieving data from the dataset closest to them, and you can seamlessly failover between regions in the case of disaster or loss of connectivity.

Additionally, replicated datasets allow you to handle more traffic. If one region is configured to handle 500 queries per second, replicating to an identical second Cloudant service would add another 500 queries per second. 

The ease of use and reliability of IBM Cloudant replication sets it apart from most other managed database services. And because of Cloudant's conflict-handling capabilities, you can rest assured that data is never lost, even if the same document is updated simultaneously from two different regions into replicated databases.

Obviously, a replicated service will increase your infrastructure costs (because you are provisioning double the capacity), but if these kinds of benefits are valuable to you, then the cost-benefit analysis will still be in your favour.

Two-way replication between two data centre regions

In this article, we will take you through the simple steps required to achieve full two-way replication between two data centre regions. These principles can be extended to replicate between three or more regions as well (see more details about complex replication topologies in this article).

We will also introduce a basic script to monitor replication and check that things are running smoothly. This script will be hosted on the IBM Code Engine service and run on a regular (cron) basis.

The following is what you will build:

We will also introduce a basic script to monitor replication and check that things are running smoothly. This script will be hosted on the IBM Code Engine service and run on a regular (cron) basis.

This tutorial should take you less than an hour to complete. It will not be entirely cost-free because you can only set up one Cloudant service on the free tier and you will need two of them. If you deprovision the services after completing the tutorial, however, you should not have to pay more than a few dollars.

What you will need

  1. An IBM Cloud pay-as-you-go account.
  2. The IBM Cloud CLI (you also need to make sure that it is logged into your account).
  3. Git.
  4. Node.js and npm.
  5. Terraform: We will be using Terraform to deploy all the required infrastructure.
  6. Docker: We will be using Docker to create the images that will run your code in Code Engine — make sure you are logged into your Docker account.
  7. jq: This is a command-line utility to manipulate JSON data files.
  8. ccurl (or CouchDB curl) a command line utility to access couch-compatible services.
  9. Access to a Mac or Linux terminal.

To get the most out of this tutorial, you will need to be familiar with the basics of NodeJS, Terraform and Docker. But there is no deep expertise required of any of them.

Tutorial steps

  1. Create two instances of IBM Cloudant in separate geographical regions.
  2. Create secure replication access between them using an IAM ServiceID.
  3. Create one database in each Cloudant instance.
  4. Create a simple NodeJS script that sets up replication between these databases and then monitors the replication. This will be deployed to Code Engine, where it will run every minute.
  5. Change the data in the databases and watch it replicate.

Step 1: Obtain an API key to deploy infrastructure to your account

You will need some credentials to be able to deploy infrastructure programatically using Terraform. Follow the steps in this document to create an API key and make a note of it for Step 2.

Step 2: Clone the repo and cd into the Terraform directory

Now you are ready to create all the necessary parameters to run the infrastructure creation process from your machine. In a terminal, type the following:

git clone https://github.com/danmermel/cloudant-replication-in-a-box
cd cloudant-replication-in-a-box/terraform

This will copy all the project files into your local machine inside a directory called cloudant-replication-in-a-box.

Now create a document called terraform.tfvars with the following fields: 

ibmcloud_api_key = "<your_api_key_from_step_1>"
region = "eu-gb"

The terraform.tfvars document contains variables that you may want to keep secret so it is ignored by the GitHub repository.

Step 3: Create the infrastructure

In this step, you will create the required infrastructure inside your IBM Cloud account.

TL;DR — Run the Terraform script:

terraform init 
terraform apply --auto-approve

In a bit more detail: The Terraform folder contains a number of simple scripts: 

  • main.tf tells Terraform to use the IBM Cloud.
  • variables.tf contains the variable definitions whose values will be populated from terraform.tfvars.
  • cloudant.tf creates the Cloudant DB instances in two different regions and some credentials that we will use later to access them
  • registry.tf creates the Container Registry that will hold your container images for running in Code Engine.
  • iam.tf creates the access key that is needed to interact with the Container Registry and the key that will be used to read and write between the Cloudant databases.

It will take several minutes for the databases and other resources to be ready, but you should now have two Cloudant database instances, a Container Registry namespace for your container images and some Identity and Access Management (IAM) credentials. You can check by visiting the Resources section of your IBM Cloud account.

Step 4: Create Cloudant databases and deploy monitoring scripts to Code Engine

Another thing the Terraform script does is output a bunch of configuration variables that we will now use.

We will run a bash script (build.sh) that takes some of that output and uses it to create a database called users in both of your Cloudant instances. It will also deploy a replication monitoring script (monitor.js) to Code Engine that will run every minute and make sure that replication is working correctly.

Run the build script — but before you do, are you logged into the IBM Cloud CLI and Docker?

Go into the root of the project and type the following:

./build.sh

How replication works in Cloudant

Replication happens between databases (in our case between the users databases in the Dallas and London regions). Every Cloudant instance has a special database in it called _replicator that contains documents with replication instructions for each database you want to replicate. Each of these documents has a source database (where you are replicating from) and a target database (where you are replicating to). It also contains any necessary credentials that allow replication to occur between these databases. Here's an example of one such document:

{
  "_id": "abc1234,
  "continuous": true,
  "source": {
    "url": "someurl/sourceDB",
    "auth": {
      "iam": {
        "api_key": "xyz678"
      }
    }
  },
  "target": {
    "url": "someurl/targetDB",
    "auth": {
      "iam": {
        "api_key": "xyz678"
      }
    }
  }
}

The monitoring script

monitor.js is a simple NodeJS script. It builds up a replication document like the one above with data passed in as environment variables.

Then it checks whether the _replicator database already contains this document (from its _id). If it does not, it uploads the document to the _replicator database. So the first time your script runs on Code Engine, the document will not exist and will get uploaded, thereby kicking off the replication process.

After that, the document will exist, but every time the script retrieves it, it will check what state it is in. If it is in any kind of error state, it will attempt to upload the document again, thereby trying to re-start the replication process. Error states can be caused by things like temporary loses of connectivity, expired credentials or other factors. This script is very simple, but it could be made more clever by, for example, generating alerts when it finds error states.

The script is running twice, once using London as the source and Dallas as the target and once using Dallas as the source and London as the target. Replication is happening both ways.

Note here that both documents could be uploaded to the same Cloudant instance; that is, the system knows that it needs to "push" to some other database and also "pull" from it. The best-practice recommendation is to have your replication documents in the instance that is the least active one. So if, for example, your London Cloudant is taking most of the application traffic, then put your replication documents in the Dallas instance. In this case, for simplicity, we have placed one document in each database.

Step 5: Watch your data replicate

The easiest way to your data replicate is in the Cloudant User interface. 

From your resources list, click on the Launch Dashboard link of your cloudantDallas and cloudantLondon instances (open them in separate tabs so you can move between them):

From your resources list, click on the Launch Dashboard link of your cloudantDallas and cloudantLondon instances (open them in separate tabs so you can move between them):

From one of them (it doesn't matter which), click on the users database and then on the Create Document button. In the editor, add a few fields and save the document. For example:

{
  "_id": "abc1234",
  "name": "Taylor",
  "surname":"Swift"
 }

Now go to the other Cloudant instance, click into the users database, and you should see the above document in there already. It's that quick.

Summary

In this tutorial we have demonstrated how easy it is to set up two-way replication between IBM Cloudant instances in two regions of the world. We have also implemented a basic monitoring script that ensures replication is working.

If high availability, whole-region disaster recovery and customer satisfaction are important factors in your application design, then you should be considering IBM Cloudant as your database.

Remember to decommission your resources so that you don't get charged additional fees. In the Terraform directory, type the following:

terraform destroy

To remove the IBM Cloud Code Engine project, type the following:

ibmcloud ce project delete --name replicationmonitor  --hard

If you want to get started with IBM Cloud Databases, check out all our current promotions here.

Be the first to hear about news, product updates, and innovation from IBM Cloud