Connections
A connection defines the information required to connect to an external system.
StreamSets flows communicate with external systems to read and write data. Most of these external systems require sensitive information, such as user names or passwords, to access the data. You can specify those details in Data Collector stages, or you can use connections to provide the information.
- Increased security
- With connections, you can limit the number of users that need to know the security credentials for external systems.
- Enhanced reusability
- After you create a connection, you can use it in multiple flows. Reusing connections reduces the possibility of user errors and simplifies updates when connection information changes. When you update a connection, every flow that uses the connection receives the updated details the next time that you run the flow.
Connection types and support
After you configure connections, you can use them in Data Collector stages and in related flow error handling properties. The engine version that is used with the flow determines the connections that are available.
For example, you create a Google Cloud Storage connection asset. You can then use it in any Google Cloud Storage stage when the flow uses a Data Collector engine version 7.2 and later. You can also use the connection when you configure flow error handling and staging information for the Google BigQuery target.
| Connection type | Supported Data Collector versions | Supported locations |
|---|---|---|
| Amazon S3 | 7.2 and later |
|
| Apache Kafka | 7.2 and later |
|
| Databricks Delta Lake | 7.2 and later |
|
| FTP | 7.3 and later |
|
| Google BigQuery | 7.2 and later |
|
| Google Cloud Pub/Sub | 7.2 and later |
|
| Google Cloud Storage | 7.2 and later |
|
| IBM Db2 | 7.2 and later |
|
| JDBC | 7.2 and later |
|
| JMS | 7.2 and later |
|
| Microsoft Azure Blob Storage | 7.2 and later |
|
| Microsoft Azure Data Lake Storage | 7.2 and later |
|
| Microsoft SQL Server | 7.2 and later |
|
| MongoDB | 7.2 and later |
|
| MySQL | 7.2 and later |
|
| Oracle | 7.2 and later |
|
| PostgreSQL | 7.2 and later |
|
| Pulsar | 7.4 and later |
|
| Rabbit MQ | 7.4 and later |
|
| Snowflake | 7.2 and later |
|
| Snowpipe | 7.5 and later |
|
| Splunk | 7.2 and later |
|
| watsonx.data Presto | 7.2 and later |
|
| Web Client | 7.2 and later |
|
Configuring a connection
To use a connection in a StreamSets flow, you must add a connection asset to your project.
When you create a connection asset, you can create the asset and configure the connection details that you want to use. Or when available, you can select an existing platform connection to use the details specified in the platform connection.
For example, say you want to use a reusable connection in IBM Db2 sources rather than define connectivity details in each stage. To do this, you need an IBM Db2 connection asset in your project. When you create the asset, you can either define all required asset properties or select an IBM Db2 platform connection to use.
For information about creating a connection asset, see Adding connections to data sources in a project.
For information about configuring connection assets for specific connection types, see Connectors.
For information about creating platform connections, see Adding platform connections.