Google BigQuery connection
To access your data in Google BigQuery, create a connection asset for it.
Google BigQuery is a fully managed, serverless data warehouse that enables scalable analysis over petabytes of data.
Create a connection to Google BigQuery
To create the connection asset, choose an authentication method. Choices include an authentication with or without workload identity federation.
Without workload identity federation
- Account key (full JSON snippet): The contents of the Google service account key JSON file
- Client ID, Client secret, Access token, and Refresh token
With workload identity federation
You use an external identity provider (IdP) for authentication. An external identity provider uses Identity and Access Management (IAM) instead of service account keys. IAM provides increased
security and centralized management. You can use workload identity federation authentication with an access token or with a token URL.
You can configure a Google BigQuery connection for workload identity federation with any identity provider that complies with the OpenID Connect (OIDC) specification and that satisfies the Google Cloud requirements that are described in Prepare your external IdP. The requirements include:
- The identity provider must support OpenID Connect 1.0.
- The identity provider's OIDC metadata and JWKS endpoints must be publicly accessible over the internet. Google Cloud uses these endpoints to download your identity provider's key set and uses that key set to validate tokens.
- The identity provider is configured so that your workload can obtain ID tokens that meet these criteria:
- Tokens are signed with the RS256 or ES256 algorithm.
- Tokens contain an aud claim.
For examples of the workload identity federation configuration steps for Amazon Web Services (AWS) and Microsoft Azure, see Workload identity federation examples.
Workload Identity Federation with access token connection details
-
Access token: An access token from the identity provider to connect to BigQuery.
-
Security Token Service audience: The security token service audience that contains the project ID, pool ID, and provider ID. Use this format:
//iam.googleapis.com/projects/PROJECT_NUMBER/locations/global/workloadIdentityPools/POOL_ID/providers/PROVIDER_ID
For more information, see Authenticate a workload by using the REST API.
-
Service account email: The email address of the Google service account to be impersonated. For more information, see Create a service account for the external workload.
-
Service account token lifetime (optional): The lifetime in seconds of the service account access token. The default lifetime of a service account access token is one hour. For more information, see URL-sourced credentials.
-
Token format: Text or JSON with the Token field name for the name of the field in the JSON response that contains the token.
-
Token field name: The name of the field in the JSON response that contains the token. This field appears only when the Token format is JSON.
-
Token type: AWS Signature Version 4 request, Google OAuth 2.0 access token, ID token, JSON Web Token (JWT), or SAML 2.0.
Workload Identity Federation with token URL connection details
-
Security Token Service audience: The security token service audience that contains the project ID, pool ID, and provider ID. Use this format:
//iam.googleapis.com/projects/PROJECT_NUMBER/locations/global/workloadIdentityPools/POOL_ID/providers/PROVIDER_ID
For more information, see Authenticate a workload using the REST API.
-
Service account email: The email address of the Google service account to be impersonated. For more information, see Create a service account for the external workload.
-
Service account token lifetime (optional): The lifetime in seconds of the service account access token. The default lifetime of a service account access token is one hour. For more information, see URL-sourced credentials.
-
Token URL: The URL to retrieve a token.
-
HTTP method: HTTP method to use for the token URL request: GET, POST, or PUT.
-
Request body (for POST or PUT methods): The body of the HTTP request to retrieve a token.
-
HTTP headers: HTTP headers for the token URL request in JSON or as a JSON body. Use format:
"Key1"="Value1","Key2"="Value2"
. -
Token format: Text or JSON with the Token field name for the name of the field in the JSON response that contains the token.
-
Token field name: The name of the field in the JSON response that contains the token. This field appears only when the Token format is JSON.
-
Token type: AWS Signature Version 4 request, Google OAuth 2.0 access token, ID token, JSON Web Token (JWT), or SAML 2.0.
Server proxy (optional)
Select Server proxy to access the Google BigQuery data source through an HTTPS proxy server. Depending on its setup, a proxy server can provide load balancing, increased security, and privacy. The proxy server settings are independent of the authentication credentials and the personal or shared credentials selection. The proxy server settings cannot be stored in a vault.
- Proxy host: The hostname or IP addess of the HTTPS proxy server. For example,
proxy.example.com
or192.0.2.0
. - Proxy port: The port number to connect to the HTTPS proxy server. For example,
8080
or8443
. - Proxy username and Proxy password.
Other properties
Project ID (optional) The ID of the Google project.
Output JSON string format: JSON string format for output values that are complex data types (for example, nested or repeated).
- Pretty: Values are formatted before sending them to output. Use this option to visually read a few rows.
- Raw: (Default) No formatting. Use this option for the best performance.
Metadata discovery: The setting determines whether comments on columns (remarks) and aliases for schema objects such as tables or views (synonyms) are retrieved when assets are added by using this connection.
Permissions
The connection to Google BigQuery requires the following BigQuery permissions:
bigquery.job.create
bigquery.tables.get
bigquery.tables.getData
Use one of three ways to gain these permissions:
- Use the predefined BigQuery Cloud IAM role
bigquery.admin
, which includes these permissions; - Use a combination of two roles, one from each column in the following table; or
- Create a custom role. See Create and manage custom roles.
First role | Second role |
---|---|
bigquery.dataEditor |
bigquery.jobUser |
bigquery.dataOwner |
bigquery.user |
bigquery.dataViewer |
For more information about permissions and roles in Google BigQuery, see Predefined roles and permissions.
Choose the method for creating a connection based on where you are in the platform
- In a project
- Click Assets > New asset > Prepare data > Connect to a data source. See Adding a connection to a project.
- In a catalog
- Click Add to catalog > Connection. See Adding a connection asset to a catalog.
- In a deployment space
- Click Import assets > Data access > Connection. See Adding data assets to a deployment space.
- In the Platform assets catalog
- Click New connection. See Adding platform connections.
Next step: Add data assets from the connection
Where you can use this connection
You can use Google BigQuery connections in the following workspaces and tools:
Projects
-
AutoAI (Watson Machine Learning)
-
Data quality rules (IBM Knowledge Catalog, IBM Knowledge Catalog Premium).See Supported data sources for curation and data quality.
-
Data Refinery (Watson Studio, IBM Knowledge Catalog any edition)
-
DataStage (DataStage service). See Connecting to a data source in DataStage.
-
Metadata enrichment (IBM Knowledge Catalog any edition).See Supported data sources for curation and data quality.
-
Metadata import (IBM Knowledge Catalog any edition).See Supported data sources for curation and data quality. For information about the supported product versions and other prerequisites when connections are based on MANTA Automated Data Lineage for IBM Cloud Pak for Data scanners, see the Lineage Scanner Configuration section in the MANTA Automated Data Lineage on IBM Cloud Pak for Data Installation and Usage Manual. This documentation is available at https://www.ibm.com/support/pages/node/6597457.
For metadata import (lineage), MANTA Automated Data Lineage for IBM Cloud Pak for Data and a corresponding license key must be installed. See Installing MANTA Automated Data Lineage and Enabling lineage import. -
SPSS Modeler (SPSS Modeler service)
-
Synthetic Data Generator (Synthetic Data Generator service)
Catalogs
-
Platform assets catalog
-
Other catalogs (IBM Knowledge Catalog)
- Data Product Hub
- You can connect to this data source from Data Product Hub. For instructions, see Connectors for Data Product Hub.
- Data Virtualization service
- You can connect to this data source from Data Virtualization. This connection requires special consideration in Data Virtualization. See Connecting to Google BigQuery in Data Virtualization.
Federal Information Processing Standards (FIPS) compliance
This connection can be used on a FIPS-enabled cluster (FIPS tolerant); however, it is not FIPS-compliant.
Google BigQuery setup
Learn more
Parent topic: Supported connections