Connections

A connection defines the information required to connect to an external system.

StreamSets flows communicate with external systems to read and write data. Most of these external systems require sensitive information, such as user names or passwords, to access the data. You can specify those details in Data Collector stages, or you can use connections to provide the information.

Using connections provides the following benefits:
Increased security
With connections, you can limit the number of users that need to know the security credentials for external systems.
For example, you can have a DevOps engineer create connections for all external systems to restrict the security credentials for those systems. When data engineers configure a flow, they can select the appropriate connection for a stage, but cannot view the details defined in the connection.
Enhanced reusability
After you create a connection, you can use it in multiple flows. Reusing connections reduces the possibility of user errors and simplifies updates when connection information changes. When you update a connection, every flow that uses the connection receives the updated details the next time that you run the flow.
For example, you might create a connection named Db2-Sales to access test data from the sales team stored in an IBM Db2 database. You develop multiple flows to process the data. Each time you use an IBM Db2 source, you select the Db2-Sales connection instead of specifying connectivity details in the stage. When all flows are ready to move into production, you update the Db2-Sales connection to point to the production database.

Connection types and support

After you configure connections, you can use them in Data Collector stages and in related flow error handling properties. The engine version that is used with the flow determines the connections that are available.

For example, you create a Google Cloud Storage connection asset. You can then use it in any Google Cloud Storage stage when the flow uses a Data Collector engine version 7.2 and later. You can also use the connection when you configure flow error handling and staging information for the Google BigQuery target.

You can use the following connection types in the specified locations when the flow is associated with a supported engine version:
Connection type Supported Data Collector versions Supported locations
Amazon S3 7.2 and later
  • Amazon S3 source
  • Amazon S3 target
  • Amazon S3 executor
  • Amazon S3 error record handling in flow properties
  • Staging location for the following stages:
    • Databricks target
    • Databricks Query executor
    • Snowflake Bulk source
    • Snowflake target
    • Teradata target
Apache Kafka 7.2 and later
  • Kafka Multitopic Consumer source
  • Kafka Producer target
  • Kafka error record handling in flow properties
Databricks Delta Lake 7.2 and later
  • Databricks target
  • Databricks Query executor
FTP 7.3 and later
  • SFTP/FTP/FTPS Client source
  • SFTP/FTP/FTPS Client target
  • SFTP/FTP/FTPS Client executor
Google BigQuery 7.2 and later
  • Google BigQuery source
  • Google BigQuery target
  • Google BigQuery executor
Google Cloud Pub/Sub 7.2 and later
  • Google Pub/Sub Subscriber source
  • Google Pub/Sub Publisher target
  • Google Pub/Sub error record handling in flow properties
Google Cloud Storage 7.2 and later
  • Google Cloud Storage source
  • Google Cloud Storage target
  • Google Cloud Storage executor
  • Google Cloud Storage error record handling in flow properties
  • Staging location for the following stages:
    • Databricks target
    • Databricks Query executor
    • Google BigQuery target
    • Snowflake Bulk source
    • Snowflake target
    • Teradata target
IBM Db2 7.2 and later
  • IBM Db2 source
  • IBM Db2 target
JDBC 7.2 and later
  • JDBC Multitable Consumer source
  • JDBC Query Consumer source
  • Oracle Bulkload source
  • JDBC Lookup processor
  • JDBC Tee processor
  • JDBC Producer target
  • SingleStore target
  • JDBC Query executor
JMS 7.2 and later
  • JMS Consumer source
  • JMS Producer target
Microsoft Azure Blob Storage 7.2 and later
  • Azure Blob Storage source
  • Azure Blob Storage target
  • Staging location for the following stages:
    • Snowflake Bulk source
    • Snowflake target
    • Teradata target
Microsoft Azure Data Lake Storage 7.2 and later
  • Azure Data Lake Storage Gen2 source
  • Azure Data Lake Storage Gen2 target
  • ADLS Gen2 File Metadata executor
  • Staging location for the following stages:
    • Databricks target
    • Databricks Query executor
    • Teradata target
Microsoft SQL Server 7.2 and later
  • SQL Server CDC Client source
  • SQL Server Change Tracking source
MongoDB 7.2 and later
  • MongoDB Atlas source
  • MongoDB Atlas CDC source
  • MongoDB Atlas Lookup processor
  • MongoDB Atlas target
MySQL 7.2 and later
  • MySQL Binary Log source
Oracle 7.2 and later
  • Oracle CDC source
  • Oracle CDC Client source
  • Oracle Multitable Consumer source
  • Oracle XStream source 7.5 and later
  • SQL Parser processor
  • Oracle target
PostgreSQL 7.2 and later
  • Aurora PostgreSQL CDC Client source
  • PostgreSQL CDC Client source
Pulsar 7.4 and later
  • Pulsar Consumer source
  • Pulsar Consumer (legacy) source
  • Pulsar Producer target
Rabbit MQ 7.4 and later
  • RabbitMQ Consumer source
  • RabbitMQ Producer target
Snowflake 7.2 and later
  • Snowflake Bulk source
  • Snowflake target
  • Snowflake File Uploader target
  • Snowflake executor
Snowpipe 7.5 and later
  • Snowflake target
Splunk 7.2 and later
  • Splunk target
watsonx.data Presto 7.2 and later
  • IBM watsonx.data target
Web Client 7.2 and later
  • Web Client source
  • Web Client processor
  • Web Client target

Configuring a connection

To use a connection in a StreamSets flow, you must add a connection asset to your project.

When you create a connection asset, you can create the asset and configure the connection details that you want to use. Or when available, you can select an existing platform connection to use the details specified in the platform connection.

For example, say you want to use a reusable connection in IBM Db2 sources rather than define connectivity details in each stage. To do this, you need an IBM Db2 connection asset in your project. When you create the asset, you can either define all required asset properties or select an IBM Db2 platform connection to use.

For information about creating a connection asset, see Adding connections to data sources in a project.

For information about configuring connection assets for specific connection types, see Connectors.

For information about creating platform connections, see Adding platform connections.