Google Cloud Storage connection

To access your data in Google Cloud Storage, create a connection asset for it.

Google Cloud Storage is an online file storage web service for storing and accessing data on Google Cloud Platform Infrastructure.

Create a connection to Google Cloud Storage

To create the connection asset, choose an authentication method. Choices include an authentication with or without workload identity federation.

Without workload identity federation

  • Account key (full JSON snippet): The contents of the Google service account key JSON file
  • Client ID, Client secret, Access token, and Refresh token

With workload identity federation
You use an external identity provider (IdP) for authentication. An external identity provider uses Identity and Access Management (IAM) instead of service account keys. IAM provides increased security and centralized management. You can use workload identity federation authentication with an access token or with a token URL.

You can configure a Google BigQuery connection for workload identity federation with any identity provider that complies with the OpenID Connect (OIDC) specification and that satisfies the Google Cloud requirements that are described in Prepare your external IdP. The requirements include:

  • The identity provider must support OpenID Connect 1.0.
  • The identity provider's OIDC metadata and JWKS endpoints must be publicly accessible over the internet. Google Cloud uses these endpoints to download your identity provider's key set and uses that key set to validate tokens.
  • The identity provider is configured so that your workload can obtain ID tokens that meet these criteria:
    • Tokens are signed with the RS256 or ES256 algorithm.
    • Tokens contain an aud claim.

For examples of the workload identity federation configuration steps for Amazon Web Services (AWS) and Microsoft Azure, see .

Workload Identity Federation with access token connection details

  • Access token: An access token from the identity provider to connect to BigQuery.

  • Security Token Service audience: The security token service audience that contains the project ID, pool ID, and provider ID. Use this format:

    //iam.googleapis.com/projects/PROJECT_NUMBER/locations/global/workloadIdentityPools/POOL_ID/providers/PROVIDER_ID
    

    For more information, see Authenticate a workload by using the REST API.

  • Service account email: The email address of the Google service account to be impersonated. For more information, see Create a service account for the external workload.

  • Service account token lifetime (optional): The lifetime in seconds of the service account access token. The default lifetime of a service account access token is one hour. For more information, see URL-sourced credentials.

  • Token format: Text or JSON with the Token field name for the name of the field in the JSON response that contains the token.

  • Token field name: The name of the field in the JSON response that contains the token. This field appears only when the Token format is JSON.

  • Token type: AWS Signature Version 4 request, Google OAuth 2.0 access token, ID token, JSON Web Token (JWT), or SAML 2.0.

Workload Identity Federation with token URL connection details

  • Security Token Service audience: The security token service audience that contains the project ID, pool ID, and provider ID. Use this format:

    //iam.googleapis.com/projects/PROJECT_NUMBER/locations/global/workloadIdentityPools/POOL_ID/providers/PROVIDER_ID
    

    For more information, see Authenticate a workload using the REST API.

  • Service account email: The email address of the Google service account to be impersonated. For more information, see Create a service account for the external workload.

  • Service account token lifetime (optional): The lifetime in seconds of the service account access token. The default lifetime of a service account access token is one hour. For more information, see URL-sourced credentials.

  • Token URL: The URL to retrieve a token.

  • HTTP method: HTTP method to use for the token URL request: GET, POST, or PUT.

  • Request body (for POST or PUT methods): The body of the HTTP request to retrieve a token.

  • HTTP headers: HTTP headers for the token URL request in JSON or as a JSON body. Use format: "Key1"="Value1","Key2"="Value2".

  • Token format: Text or JSON with the Token field name for the name of the field in the JSON response that contains the token.

  • Token field name: The name of the field in the JSON response that contains the token. This field appears only when the Token format is JSON.

  • Token type: AWS Signature Version 4 request, Google OAuth 2.0 access token, ID token, JSON Web Token (JWT), or SAML 2.0.

Server proxy (optional)

Select Server proxy to access the Google Cloud Storage data source through an HTTPS proxy server. Depending on its setup, a proxy server can provide load balancing, increased security, and privacy. The proxy server settings are independent of the authentication credentials and the personal or shared credentials selection. A SSL certificate can be provided for added security.

  • Proxy host: The hostname or IP addess of the HTTPS proxy server. For example, proxy.example.com or 192.0.2.0.
  • Proxy port: The port number to connect to the HTTPS proxy server. For example, 8080 or 8443.
  • Proxy username and Proxy password.

Other properties

Project ID (optional) The ID of the Google project.

Choose the method for creating a connection based on where you are in the platform

In a project
Click Assets > New asset > Connect to a data source. See Adding a connection to a project.
In a catalog
Click Add to catalog > Connection. See Adding a connection asset to a catalog.
In a deployment space
Click Import assets > Data access > Connection. See Adding data assets to a deployment space.
In the Platform assets catalog
Click New connection. See Adding platform connections.

Next step: Add data assets from the connection

Where you can use this connection

You can use Google Cloud Storage connections in the following workspaces and tools:

Projects

  • Data Refinery (watsonx.ai Studio or IBM Knowledge Catalog)
  • DataStage (DataStage service). See Connecting to a data source in DataStage.
  • Decision Optimization (watsonx.ai Studio and watsonx.ai Runtime)
  • Metadata import (IBM Knowledge Catalog)
  • SPSS Modeler (watsonx.ai Studio)

Catalogs

  • Platform assets catalog

  • Other catalogs (IBM Knowledge Catalog)

Supported file types

The Google Cloud Storage connection supports these file types:  Avro, CSV, Delimited text, Excel, JSON, ORC, Parquet, SAS, SAV, SHP, and XML.

Table formats

The Google Cloud Storage connection supports these Data Lake table formats: Delta Lake and Iceberg.

Learn more

Parent topic: Supported connections