Required datasets, accounts, and privileges

Before you create a CDC Replication instance you need to ensure that you have the required datasets, account, and directory access.

Configuring a connection to the BigQuery data warehouse

When you configure the CDC Replication Engine for BigQuery, you are prompted for a service account private key .json file. The CDC Replication Engine for BigQuery uses this private key .json file to connect to the BigQuery project’s datasets. Before installing CDC Replication, ensure that this service account exists and that it has a WRITER or OWNER role for the datasets to which the CDC Replication Engine for BigQuery will write.

Setting up a Linux user account

When you are installing CDC Replication on a Linux® machine, you must set up a new account, or decide on an existing Linux account that you use to install, configure, or upgrade CDC Replication. You can install CDC Replication in the directory of your choice; however, it must be owned by the Linux account.

Configuring a BigQuery data warehouse

When you configure the CDC Replication Engine for BigQuery, you are prompted for the name of the BigQuery dataset you want CDC Replication to connect to and store metadata. Before installing the software, ensure that this BigQuery dataset exists and that you have created and set up a Google service account that has access to it.

Creating directories for BigQuery refresh loader files

Create or decide on the directory on the CDC Replication Engine for BigQuery host that you want to use for the refresh loader utility to write CSV files. The CDC Replication Engine for BigQuery loads the files into BigQuery datasets. Your CDC Replication user account must have read and write permissions for this directory. Use a different directory for each instance of CDC Replication.