BigQuery Resource Configuration

Before you configure your scanner, make sure you meet the prerequisites. Read our guide on BigQuery integration requirements to double-check.

Source System Properties

This configuration can be setup by creating a new connection on Admin UI > Connections tab or editing an existing connection in Admin UI > Connections > Databases > BigQuery > specific connection. New connection can also be created via General Walkthrough Example (Create a Connection and Execute It).

Granularity of the IBM Automatic Data Lineage connection for BigQuery is one BigQuery server. Use filter on projects and datasets to limit the scope of analysis as needed. Use of multiple connections against a single BigQuery server may lead to within-system lineage not be connected properly.

Property name

Description

Example

bigquery.dictionary.id

Name of a resource representing this BigQuery server known as the dictionary ID, used as an output subdirectory name for extracted DDL files and the database dictionary

bigquery

bigquery.credentials.privateKey

Copy the value of the private_key field from the downloaded service account credentials JSON file, replace the “\n“ symbols with newlines, and paste it into the Private Key field.

See BigQuery Service Account Credentials for details on how to create a service account and credentials.

bigquery.credentials.clientEmail

Copy the value of the client_email field from the downloaded service account credentials JSON file and paste it into the Client Email field.

See BigQuery Service Account Credentials for details on how to create a service account and credentials.

bigquery.extractedDbsSchemas

List of projects and datasets to extract, separated by commas, which are to be provided in the format project/dataset; each part is evaluated as a regular expression

project1/dataset1,project2/dataset2,project3

bigquery.extraction.method

Set to Agent:default when the desired extraction method is the default Manta Extractor Agent, set to Agent:{remote_agent_name} when a remote Agent is the desired extraction method, set to Git:{git.dictionary.id} when the Git ingest method is the desired extraction method. For more information on setting up a remote extractor Agent please refer to the Manta Flow Agent Configuration for Extraction documentation. For additional details on configuring a Git ingest method, please refer to the Manta Flow Agent Configuration for Extraction:Git Source documentation.

default

Git

agent

bigquery.excludedDbsSchemas

List of projects and datasets to exclude from extraction, separated by commas; each part is evaluated as a regular expression

project3/dataset3,project3/dataset4

bigquery.ddl.encoding

Encoding of automatically extracted DDL scripts. See Encodings for applicable values.

utf8

bigquery.script.encoding

Encoding of manually provided SQL scripts performed on this database server. See Encodings for applicable values.

utf8

bigquery.oauth2.endpoint.uri

Specifies the URI for the OAuth2 endpoint for authentication. Necessary if the default OAuth2 URI (https://oauth2.googleapis.com/token) is not accessible or a different endpoint is preferred.

https://oauth2-dev.p.googleapis.com/token

BigQuery Service Account Credentials

Credentials for a BigQuery service account that has a role with the right privileges to extract metadata.

The service account can be created on the Service Accounts page in the Cloud Console.

After the service account is created, add a new key pair and download a private key file in JSON format.

See Getting Started with Authentication for details.

Common Scanner Properties

This configuration is common for all BigQuery source systems and for all BigQuery scenarios, and is configured in Admin UI > Configuration > CLI > BigQuery > BigQuery Common. It can be overridden on individual connection level.

Property name

Description

Example

bigquery.dictionary.dir

Directory with data dictionaries extracted from BigQuery

${manta.dir.temp}/bigquery

bigquery.url

A URL for the target BigQuery API service

https://www.googleapis.com/bigquery/v2

filepath.lowercase

Whether paths to files should be lowercase (false for case-sensitive file systems, true otherwise)

false

true

bigquery.dll.output

Directory for automatically extracted BigQuery DDL scripts (for the extraction phase)

${manta.dir.temp}/bigquery/${bigquery.dictionary.id}/ddl

bigquery.ddl.input

Directory with automatically extracted BigQuery DDL scripts (for the analysis phase)

${bigquery.dll.output}

bigquery.script.input

Directory with manually provided SQL scripts which are performed on a given database server (for the analysis phase)

${manta.dir.input}/bigquery/${bigquery.dictionary.id}/sql

bigquery.job.script.input

Directory with manually provided job scripts which are performed on a given database server (for the analysis phase)

${manta.dir.input}/bigquery/${bigquery.dictionary.id}/jobs

bigquery.job.script.encoding

Encoding of manually provided job scripts performed on this BigQuery database instance. See Encodings for applicable values.

utf8

bigquery.script.replace

Path to the CSV file with the replacements to be applied to the provided SQL scripts; see Placeholder Replacement in Input Scripts for details about the replacement file format

${manta.dir.input}/bigquery/${bigquery.dictionary.id}/replace.csv

bigquery.script.replace.regex

Flag specifying whether replacements for SQL scripts in the provided CSV file specified in bigquery.script.replace should be interpreted as regular expressions (true) or simple text (false)

false

true

bigquery.analyze.parallelCount

Number of parallel threads which will analyze DDL and SQL scripts

4

bigquery.dictionary.mappingFile

Path to automatically generated mappings for BigQuery servers

${manta.dir.temp}/bigquery/bigqueryDictionaryMantaMapping.csv

bigquery.dictionary.mappingManualFile

Path to mappings provided manually for BigQuery servers

${manta.dir.scenario}/conf/bigqueryDictionaryMantaMappingManual.csv

bigquery.connections.file

Connection definitions file with database connection resource definitions used in federated queries.

For more information about this file and its format, see the section on Connection Definition Settings in Informatica PowerCenter Resource Configuration and get more details in (Manual) Connection Mappings Explained.

connectionsConfiguration.prm

bigquery.connections.path

Path to the connection definitions base directory; a path to the connection definitions file will be built using the following format: ${bigquery.connections.path}/${bigquery.dictionary.id}/${bigquery.connections.file}
The ${dictionaryId} is the ID of the dictionary currently being processed, which is automatically added and does not need to be listed anywhere.
If the connection-specific file is not found, the path ${bigquery.connections.path}/${bigquery.connections.file} will be used.

${manta.dir.input}/bigquery

bigquery.analyze.retainUnusedResultSetColumns

Flag specifying whether the data lineage should include sub-query resultset columns that do not have any downstream lineage

By default, set to false

false
true