Amazon S3 connection

To access your data in Amazon S3, create a connection asset for it.

Amazon S3 (Amazon Simple Storage Service) is a service that is offered by Amazon Web Services (AWS) that provides object storage through a web service interface.

For other types of S3-compliant connections, you can use the Generic S3 connection.

Create a connection to Amazon S3

To create the connection asset, you need these connection details:

  • Bucket: Bucket name that contains the files. If your AWS credentials have permissions to list buckets and access all buckets, then you only need to supply the credentials. If your credentials don't have the privilege to list buckets and can only access a particular bucket, then you need to specify the bucket.
  • Endpoint URL: Use for an AWS GovCloud instance. Include the region code. For example, https://s3.<region-code>.amazonaws.com. For the list of region codes, see AWS service endpoints.
  • Region: Amazon Web Services (AWS) region. If you specify an Endpoint URL that is not for the AWS default region (us-west-2), then you should also enter a value for Region.

Select Server proxy to access the Amazon S3 data source through a proxy server. Depending on its setup, a proxy server can provide load balancing, increased security, and privacy. The proxy server settings are independent of the authentication credentials and the personal or shared credentials selection. The proxy server settings cannot be stored in a vault.

  • Proxy host: The proxy URL. For example, https://proxy.example.com.
  • Proxy port number: The port number to connect to the proxy server. For example, 8080 or 8443.
  • The Proxy username and Proxy password fields are optional.

Credentials

The combination of Access key and Secret key is the minimum credentials.

For Credentials, you can use secrets if a vault is configured for the platform and the service supports vaults. For information, see Using secrets from vaults in connections.

If the Amazon S3 account owner has set up temporary credentials or a Role ARN (Amazon Resource Name), enter the values provided by the Amazon S3 account owner for the applicable authentication combination:

  • Access key, Secret key, and Session token
  • Access key, Secret key, Role ARN, Role session name, and optional Duration seconds
  • Access key, Secret key, Role ARN, Role session name, External ID, and optional Duration seconds

For setup instructions for the Amazon S3 account owner, see Setting up temporary credentials or a Role ARN for Amazon S3.

Choose the method for creating a connection based on where you are in the platform

In a project
Click Assets > New asset > Prepare data > Connect to a data source. See Adding a connection to a project.
In a catalog
Click Add to catalog > Connection. See Adding a connection asset to a catalog.
In a deployment space
Click Import assets > Data access > Connection. See Adding data assets to a deployment space.
In the Platform assets catalog
Click New connection. See Adding platform connections.

Next step: Add data assets from the connection

Where you can use this connection

You can use Amazon S3 connections in the following workspaces and tools:

Projects

  • AutoAI (Watson Machine Learning)
  • Data quality rules (IBM Knowledge Catalog, IBM Knowledge Catalog Premium). In addition to flat files, the Delta Lake table format is supported. See Supported data sources for curation and data quality.
  • Data Refinery (Watson Studio, IBM Knowledge Catalog any edition)
  • DataStage (DataStage service). See Connecting to a data source in DataStage.
  • Decision Optimization (Watson Studio and Watson Machine Learning)
  • Metadata enrichment (IBM Knowledge Catalog any edition). In addition to flat files, the Delta Lake table format is supported. See Supported data sources for curation and data quality.
  • Metadata import (IBM Knowledge Catalog any edition). In addition to flat files, the Delta Lake table format is supported. See Supported data sources for curation and data quality.
  • Notebooks (Watson Studio). Click Read data on the Code snippets pane to get the connection credentials and load the data into a data structure. See Load data from data source connections.
  • SPSS Modeler (SPSS Modeler service)
  • Synthetic Data Generator (Synthetic Data Generator service)
  • Watson Machine Learning Accelerator (Watson Machine Learning Accelerator service)

Catalogs

  • Platform assets catalog

  • Other catalogs (IBM Knowledge Catalog)

Data Product Hub
You can connect to this data source from Data Product Hub. For instructions, see Connectors for Data Product Hub.
Data Virtualization service
You can connect to this data source from Data Virtualization. This connection requires special consideration in Data Virtualization. See Connecting to Amazon S3 in Data Virtualization.

Federal Information Processing Standards (FIPS) compliance

This connection can be used on a FIPS-enabled cluster (FIPS tolerant); however, it is not FIPS-compliant.

Amazon S3 setup

See the Amazon Simple Storage Service User Guide for the setup steps.

Restriction

Folders cannot be named with the slash symbol (/) because the slash symbol is a delimiter for the file structure.

Supported file types

The Amazon S3 connection supports these file types:  Avro, CSV, Delimited text, Excel, JSON, ORC, Parquet, SAS, SAV, SHP, and XML.

Table formats

In addition to Flat file, the Amazon S3 connection supports these Data Lake table formats: Delta Lake and Iceberg.

Learn more

Amazon S3 documentation

Related connection: Generic S3 connection

Parent topic: Supported connections