Required datasets, accounts, and privileges
Before you create a CDC Replication instance you need to ensure that you have the required datasets, account, and directory access.
Configuring a connection to the BigQuery data warehouse
When you configure the CDC Replication Engine for BigQuery, you are prompted
for a service account private key .json file. The CDC Replication Engine for BigQuery uses this private
key .json file to connect to the BigQuery project’s datasets. Before installing
CDC Replication, ensure that
this service account exists and that it has a WRITER or OWNER role
for the datasets to which the CDC Replication Engine for BigQuery will write.
Setting up a Linux user account
When you are installing CDC Replication on a Linux® machine, you must set up a new account, or decide on an existing Linux account that you use to install, configure, or upgrade CDC Replication. You can install CDC Replication in the directory of your choice; however, it must be owned by the Linux account.
Configuring a BigQuery data warehouse
When you configure the CDC Replication Engine for BigQuery, you are prompted for the name of the BigQuery dataset you want CDC Replication to connect to and store metadata. Before installing the software, ensure that this BigQuery dataset exists and that you have created and set up a Google service account that has access to it.
Creating directories for BigQuery refresh loader files
Create or decide on the directory on the CDC Replication Engine for BigQuery host that you want
to use for the refresh loader utility to write CSV files. The CDC Replication Engine for BigQuery loads the files
into BigQuery datasets. Your CDC Replication user account must have
read and write permissions for this directory. Use a different directory for each instance of
CDC Replication.