October 18, 2021 By Daniel Mermelstein
Glynn Bird
5 min read

How to move data from multiple IBM Cloudant tables to a PostgreSQL “data warehouse” so that you can query the data in more creative ways.

In previous posts, we have looked at combining various IBM services to create applications that address real-life requirements faced by enterprises. For example, combining IBM Cloudant and IBM Code Engine or IBM Cloudant and IBM Databases for Redis.

One important requirement in most organisations is the need to make your data work hard for different purposes. On the one hand, you may need a highly available, multi-region, lightning-fast store of data so that your e-commerce application can quickly sell your items and process the orders. Meanwhile, your business intelligence team may want to interrogate that sales data and join it with other sources to produce reports, dashboards or machine learning models that drive business and logistical decisions. Some of that data will likely not even be in your operational data store, but originate somewhere else — inside or outside your organisation — like clickstream data, historical weather reports or application performance monitoring. So how do we start to deal with this deluge of data to drive insights? 

For the first requirement, something like IBM Cloudant would do a great job. IBM Cloudant is a fully managed, distributed database optimized for heavy workloads and fast-growing web and mobile apps.

But Cloudant would not be optimal for the second requirement because “joining” data from different documents is not something that document stores do well. For that, you want a more relational type of database, like IBM Cloud Databases for PostgreSQL. A relational database like PostgreSQL is great for operational or analytical use cases, and it has robust JDBC support for all your BI tooling. 

Joining forces

In this tutorial, we will go through a simple NodeJS example of how you can move data from multiple IBM Cloudant tables to a PostgreSQL “data warehouse” so that you can query the data in more creative ways.

This will not be cost-free because PostgreSQL does not offer a free tier. But if you set it up and de-provision the infrastructure straight after you are done, it should not cost more than a few dollars. Cloudant offers a generous free tier that you can use for this tutorial.

The tutorial itself should take between an hour and two hours to complete. It is a bit harder than the ones mentioned above because it requires more terminal work and command-line applications to move data around, but hopefully, it will be worth the extra effort. If you like the command line, this one is for you.


  • Access to a terminal on your computer/laptop (Linux/macOS)
  • An IBM Cloud pay-go account 
  • Git 
  • Terraform: We will be using Terraform to simplify the creation of the infrastructure
  • Couchwarehouse: A command-line facility that lets you export data from Couch-like document databases like Cloudant into relational databases like PostgreSQL
  • jq: Used to manipulate JSON structures
  • base64 (this is normally part of your laptop distribution)
  • psql
  • couchimport: A command-line facility to import datasets into Cloudant


Your Cloudant database stores all the orders for your company. Every order has the following data points:

        "_id": "R96Z8DFQ6JQ6IEMX",
        "type": "order",
        "customerId": "1791",
        "customerName": "Keely Greenberg",
        "email": "keely_greenberg@gmail.com",
        "orderId": "R96Z8DFQ6JQ6IEMX",
        "date": "2021-03-22T08:54:52.421Z",
        "total": "178.21",
        "tax": "8.74",
        "address": {
                "street": "1249 Tanners Circle",
                "state": "Arizona",
                "zip": "55942"
        "basket": ["DORINA", "BHAKTI", "PANCA"]

In a separate Cloudant database, you have generic data about all 50 US states:

        "name": "Wyoming",
        "abbreviation": "WY",
        "capital": "Cheyenne",
        "city": "Cheyenne",
        "population": 582658,
        "area": 253335,
        "waterarea": 1865,
        "landarea": 251470,
        "houseseats": 1,
        "statehood": "1890-07-10"

You could easily figure out the sales per state using only Cloudant (because every order document has the state in the delivery address). But what happens when your marketing team asks for state sales per head of population to try to figure out which are their more lucrative locations? You have the data, but in separate documents they are going to be hard to combine.

Step-by-step instructions

For this tutorial we will do the following:

  • Install the cloud databases
  • Import some pre-made orders and geo data into Cloudant using couchimport
  • Export it from Cloudant to PostgreSQL using couchwarehouse
  • Query the PostgreSQL data joining different tables

Step 1: Obtain an API key to deploy infrastructure to your account

Follow the steps in this document to create an API key and make a note of it for Step 2.

Step 2: Clone the repo and cd into the Terraform directory

git clone https://github.com/danmermel/cloudant-postgres-warehouse.git
cd cloudant-postgres-warehouse/terraform

Create a document called terraform.tfvars with the following fields: 

ibmcloud_api_key = "<your_api_key>"
region = "eu-gb"
postgres_password = "<make_up_a_password>"

The terraform.tfvars document contains variables that you may want to keep secret, so it is ignored by the GitHub repository.

Step 3: Create the databases

TL;DR – Run the Terraform script:

terraform init 
terraform apply --auto-approve

A bit more info: The Terraform folder contains a small number of simple scripts: 

  • main.tf tells Terraform to use the IBM Cloud 
  • variables.tf contains the variable definitions whose values will be populated from terraform.tfvars
  • cloudant.tf creates a free tier Cloudant DB and some credentials that we will use later to access it
  • postgres.tf creates the PostgreSQL DB 

It will take several minutes for the databases to be ready, but you should now you have a PostgreSQL DB and a Cloudant DB in your account.

Step 4: Get access to PostgreSQL

First, you need Postgres to be accessible to you and to couchwarehouse — we will use the account credentials we got from a previous step by copying them to a file:

terraform output -json > ../creds.json
cd ..

Then, manipulate the output to create environment variables that couchwarehouse and Postgres can use:

cat creds.json | jq '.postgres_credentials.value[].certbase64' | sed 's/"//g' | base64 --decode > psql.cert
export PGSSLROOTCERT=psql.cert
export PGHOST=`cat creds.json | jq '.postgres_credentials.value[].hosts[].hostname' | sed 's/"//g'`
export PGPORT=`cat creds.json | jq '.postgres_credentials.value[].hosts[].port' | sed 's/"//g'`
export PGDATABASE=ibmclouddb
export PGUSER=admin
export PGPASSWORD="<the password you created in step 2>"

Step 5: Upload data to Cloudant

You will need some more environment variables to access the Cloudant database and put some data into it:

export COUCH_URL=`cat creds.json | jq '.cloudant_credentials.value.url' | sed 's/"//g'`
export COUCH_DATABASE=orders

First, import the order data:

//Create the database

//import the data into Cloudant
cat orderdata.json | couchimport --type jsonl

//run couchwarehouse to copy the data to postgres
couchwarehouse --databaseType postgresql

Now, repeat the process for the geo data. You’ll first have to CTRL+C on the terminal because couchwarehouse continues to listen for changes to import:

export COUCH_DATABASE=geodata

//Create the database

// Import the data into Cloudant
cat geodata.json | couchimport --type jsonl

//run couchwarehouse to copy the data to postgres

couchwarehouse --databaseType postgresql

Now you should be able to see all the data in PostgreSQL by running the psql client (which uses the credentials from Step 4). In the terminal, type the following:


You should be in the psql client:

psql (13.4, server 12.7)
SSL connection (protocol: TLSv1.3, cipher: TLS_AES_256_GCM_SHA384, bits: 256, compression: off)
Type "help" for help.

You should now be able to see some order data. For example, to see all the orders per state:

select orders.address_state, count(*)  from orders group by 1 order by 2 desc;

More importantly, you can now join the tables to see dollar sales per head of population:

select name, (sum(cast(orders.total as float))/geodata.population)  from orders join geodata on orders.address_state = geodata.name group by 1,geodata.population order by 2 asc;

The results in this example are predictable because it is a randomly generated set of orders so the more populous states will end up with the smaller orders per head. It illustrates the power of combining different technologies to do the jobs at which they are best.

Remember to de-provision your resources so you don’t incur extra costs. To do that, go into your project’s Terraform directory and type the following:

terraform destroy

You will be prompted to confirm the action and Terraform will de-provision the resources. 

In a real-life scenario, a lot of this shifting of data would be done by automated background processes, and you would be using each database for what it does best.

Learn more

Remember that if you are new to these services you can get free usage credits to get you started.

Was this article helpful?

More from Cloud

IBM Tech Now: April 8, 2024

< 1 min read - ​Welcome IBM Tech Now, our video web series featuring the latest and greatest news and announcements in the world of technology. Make sure you subscribe to our YouTube channel to be notified every time a new IBM Tech Now video is published. IBM Tech Now: Episode 96 On this episode, we're covering the following topics: IBM Cloud Logs A collaboration with IBM watsonx.ai and Anaconda IBM offerings in the G2 Spring Reports Stay plugged in You can check out the…

The advantages and disadvantages of private cloud 

6 min read - The popularity of private cloud is growing, primarily driven by the need for greater data security. Across industries like education, retail and government, organizations are choosing private cloud settings to conduct business use cases involving workloads with sensitive information and to comply with data privacy and compliance needs. In a report from Technavio (link resides outside ibm.com), the private cloud services market size is estimated to grow at a CAGR of 26.71% between 2023 and 2028, and it is forecast to increase by…

Optimize observability with IBM Cloud Logs to help improve infrastructure and app performance

5 min read - There is a dilemma facing infrastructure and app performance—as workloads generate an expanding amount of observability data, it puts increased pressure on collection tool abilities to process it all. The resulting data stress becomes expensive to manage and makes it harder to obtain actionable insights from the data itself, making it harder to have fast, effective, and cost-efficient performance management. A recent IDC study found that 57% of large enterprises are either collecting too much or too little observability data.…

IBM Newsletters

Get our newsletters and topic updates that deliver the latest thought leadership and insights on emerging trends.
Subscribe now More newsletters