BigQuery Resource Configuration
Source System Properties
This configuration can be setup by creating a new connection on Admin UI > Connections tab or editing an existing connection in Admin UI > Connections > Databases > BigQuery > specific connection. New connection can also be created via General Walkthrough Example (Create a Connection and Execute It).
Granularity of the IBM Automatic Data Lineage connection for BigQuery is one BigQuery server. Use filter on projects and datasets to limit the scope of analysis as needed. Use of multiple connections against a single BigQuery server may lead to within-system lineage not be connected properly.
Property name |
Description |
Example |
---|---|---|
bigquery.dictionary.id |
Name of a resource representing this BigQuery server known as the dictionary ID, used as an output subdirectory name for extracted DDL files and the database dictionary |
bigquery |
bigquery.credentials.privateKey |
Copy the value of the private_key field from the downloaded service account credentials JSON file, replace the “\n“ symbols with newlines, and paste it into the Private Key field. See BigQuery Service Account Credentials for details on how to create a service account and credentials. |
|
bigquery.credentials.clientEmail |
Copy the value of the client_email field from the downloaded service account credentials JSON file and paste it into the Client Email field. See BigQuery Service Account Credentials for details on how to create a service account and credentials. |
|
bigquery.extractedDbsSchemas |
List of projects and datasets to extract, separated by commas, which are to be provided in the format project/dataset; each part is evaluated as a regular expression |
project1/dataset1,project2/dataset2,project3 |
bigquery.extraction.method |
Set to Agent:default when the desired extraction method is the default Manta Extractor Agent, set to Agent:{remote_agent_name} when a remote Agent is the desired extraction method, set to Git:{git.dictionary.id} when the Git ingest method is the desired extraction method. For more information on setting up a remote extractor Agent please refer to the Manta Flow Agent Configuration for Extraction documentation. For additional details on configuring a Git ingest method, please refer to the Manta Flow Agent Configuration for Extraction:Git Source documentation. |
default Git agent |
bigquery.excludedDbsSchemas |
List of projects and datasets to exclude from extraction, separated by commas; each part is evaluated as a regular expression |
project3/dataset3,project3/dataset4 |
bigquery.ddl.encoding |
Encoding of automatically extracted DDL scripts. See Encodings for applicable values. |
utf8 |
bigquery.script.encoding |
Encoding of manually provided SQL scripts performed on this database server. See Encodings for applicable values. |
utf8 |
bigquery.oauth2.endpoint.uri |
Specifies the URI for the OAuth2 endpoint for authentication. Necessary if the default OAuth2 URI ( |
|
BigQuery Service Account Credentials
Credentials for a BigQuery service account that has a role with the right privileges to extract metadata.
The service account can be created on the Service Accounts page in the Cloud Console.
After the service account is created, add a new key pair and download a private key file in JSON format.
See Getting Started with Authentication for details.
Common Scanner Properties
This configuration is common for all BigQuery source systems and for all BigQuery scenarios, and is configured in Admin UI > Configuration > CLI > BigQuery > BigQuery Common. It can be overridden on individual connection level.
Property name |
Description |
Example |
---|---|---|
bigquery.dictionary.dir |
Directory with data dictionaries extracted from BigQuery |
${manta.dir.temp}/bigquery |
bigquery.url |
A URL for the target BigQuery API service |
|
filepath.lowercase |
Whether paths to files should be lowercase (false for case-sensitive file systems, true otherwise) |
false true |
bigquery.dll.output |
Directory for automatically extracted BigQuery DDL scripts (for the extraction phase) |
${manta.dir.temp}/bigquery/${bigquery.dictionary.id}/ddl |
bigquery.ddl.input |
Directory with automatically extracted BigQuery DDL scripts (for the analysis phase) |
${bigquery.dll.output} |
bigquery.script.input |
Directory with manually provided SQL scripts which are performed on a given database server (for the analysis phase) |
${manta.dir.input}/bigquery/${bigquery.dictionary.id}/sql |
bigquery.job.script.input |
Directory with manually provided job scripts which are performed on a given database server (for the analysis phase) |
${manta.dir.input}/bigquery/${bigquery.dictionary.id}/jobs |
bigquery.job.script.encoding |
Encoding of manually provided job scripts performed on this BigQuery database instance. See Encodings for applicable values. |
utf8 |
bigquery.script.replace |
Path to the CSV file with the replacements to be applied to the provided SQL scripts; see Placeholder Replacement in Input Scripts for details about the replacement file format |
${manta.dir.input}/bigquery/${bigquery.dictionary.id}/replace.csv |
bigquery.script.replace.regex |
Flag specifying whether replacements for SQL scripts in the provided CSV file specified in |
false true |
bigquery.analyze.parallelCount |
Number of parallel threads which will analyze DDL and SQL scripts |
4 |
bigquery.dictionary.mappingFile |
Path to automatically generated mappings for BigQuery servers |
${manta.dir.temp}/bigquery/bigqueryDictionaryMantaMapping.csv |
bigquery.dictionary.mappingManualFile |
Path to mappings provided manually for BigQuery servers |
${manta.dir.scenario}/conf/bigqueryDictionaryMantaMappingManual.csv |
bigquery.connections.file |
Connection definitions file with database connection resource definitions used in federated queries. For more information about this file and its format, see the section on Connection Definition Settings in Informatica PowerCenter Resource Configuration and get more details in (Manual) Connection Mappings Explained. |
connectionsConfiguration.prm |
bigquery.connections.path |
Path to the connection definitions base directory; a path to the connection definitions file will be built using the following format:
|
${manta.dir.input}/bigquery |
bigquery.analyze.retainUnusedResultSetColumns |
Flag specifying whether the data lineage should include sub-query resultset columns that do not have any downstream lineage By default, set to false |
false |