Google BigQuery lineage configuration
To import lineage metadata from Google BigQuery, create a connection, data source definition and metadata import job.
To import lineage metadata for Google BigQuery, complete these steps:
- Create a data source definition.
- Create a connection to the data source in a project.
- Create a metadata import.
Creating a data source definition
Create a data source definition. Select Google BigQuery as the data source type.
Creating a connection to Google BigQuery
Create a connection to the data source in a project. For connection details, see Google BigQuery connection.
If you want to use Google BigQuery metadata import job, select one of the following authentication methods:
- Account key (full JSON snippet)
- Workload Identity Federation with access token - This authentication method is available in IBM watsonx.data intelligence 2.2.1 and later. For more information, see: Authentication with workload identity federation and Workload Identity Federation with access token connection details.
- Workload Identity Federation with token URL - This authentication method is available in IBM watsonx.data intelligence 2.2.1 and later. For more information, see: Authentication with workload identity federation and Workload Identity Federation with token URL connection details.
If you select Client ID, Client secret, Access token and Refresh token authentication method, the metada import job fails.
Creating a metadata import
Create a metadata import. Learn more about options that are specific to Google BigQuery data source:
Include and exclude lists
You can include or exclude assets up to the dataset level. Provide projects and datasets in the format project/dataset. Each part is evaluated as a regular expression. Assets which are added later in the data source will also be included or excluded if they match the conditions specified in the lists. Example values:
myProject/: all datasets inmyProject,myProject2/.*: all datasets inmyProject2,myProject3/myDataset1:myDataset1frommyProject3,myProject4/myDataset[1-5]: any Dataset in mymyProject4with a name that starts withmyDatasetand ends with a digit between 1 and 5
External inputs
If you use external Google BigQuery SQL or job scripts, you can add them in a .zip file as an external input. You can organize the structure of a .zip file as subfolders that represent projects and datasets. After the scripts are scanned, they are added under respective projects and datasets in the selected catalog or project. The .zip file can have the following structure:
<project_id>
<dataset_name>
<script_name.sql>
<project_id>
<script_name.sql>
jobs
<job_name.json>
<script_name.sql>
replace.csv
connectionsConfiguration.prm
The replace.csv file contains placeholder replacements for the scripts that are added in the .zip file. For more information about the format, see Placeholder replacements.
The connectionsConfiguration.prm file contains database connection resource definitions used in federated queries. The file can have the followng structure:
[{Shortcut_Name}] Type={connection_type}
Connection_String={connection_string}
Server_Name={server_name}
Database_Name={database_name}
Schema_Name={schema_name}
User_Name={user_name}