Install additional stage libraries
Install additional stage libraries to use stages that are not included in the core or common installation of Data Collector. This is an optional step, but core installations typically require installing additional stage libraries.
For a complete list of the stages installed with each stage library, see Available stage libraries.
You can install additional RPM stage libraries using the Data Collector command line program.
You can install additional tarball stage libraries using the Package Manager within Data Collector, using the stage library panel in the pipeline canvas, or using the Data Collector command line program.
An installation with Cloudera Manager is a full installation that includes all available stage libraries. As a result, you cannot install or uninstall additional stage libraries in a Cloudera Manager installation.
Installing for RPM
Use the following commands to install additional stage libraries for a core RPM installation:
- To install one or more stage libraries:
- Use the following command to install the stage libraries downloaded to the current directory:
- To list the stage libraries installed on the current Data Collector:
- Use the following
command:
yum list installed | grep streamsets - To uninstall libraries when necessary:
- Use the following
command:
yum remove <libraryID> <libraryID> ...
Installing for Tarball Using Package Manager
You can use Package Manager within Data Collector to install additional stage libraries for a core or common tarball installation.
Complete one of the following steps to display Package Manager:
-
Click the Package Manager icon
. - Click Add/Remove Stages in the Stage Library panel when viewing a pipeline in the pipeline canvas.
Package Manager lists all available stage libraries and the stages within each stage library. Origins display in blue, processors in orange, destinations in light green, and executors in dark green. Installed stage libraries display a check mark in the Installed column. You can filter the stage libraries by type or you can search for a stage library in the list.
To install an additional stage library, click the More icon for
the library, and then click Install. Or to install multiple stage
libraries, select the libraries in the list and then click the
Install icon
. Confirm that you want to install the libraries,
and then restart Data Collector for
the changes to take effect.
For information about the stages installed with each stage library, see Available stage libraries.
Installing for Tarball Using the Stage Library Panel
You can use the stage library panel in the pipeline canvas to install additional stage libraries for a core or common tarball installation.
By default, the stage library panel in the pipeline canvas displays all Data Collector stages, instead of only the installed stages. Stages that are not installed appear disabled, or greyed out. For example, the stage library panel shown below indicates that the Azure origins are not installed:

To install an additional stage library, click on a disabled stage. Confirm that you want to install the library, and then restart Data Collector for the changes to take effect.
For information about the stages installed with each stage library, see Available stage libraries.
Installing for Tarball Using the Command Line
You can
use the stagelibs command to install additional stage libraries for a
core or common tarball
installation.
The stagelibs command requires that curl version 7.18.1 or later and
sha1sum utilities are installed on the machine. Verify that these utilities are
installed before running the command.
- To view the list of available libraries:
- Run the following command from the
$SDC_DISTdirectory:bin/streamsets stagelibs -list - To install one or more stage libraries:
- Run the following command from the
$SDC_DISTdirectory:bin/streamsets stagelibs -install=<libraryID>,<libraryID>,... - To generate the command required to perform the current installation (optional):
- You can use the
stagelibscommand to generate the command to install the libraries that are installed on the current Data Collector. This allows you to easily replicate the installation elsewhere. - To uninstall libraries when necessary:
- To uninstall a library, run the following command from the
$SDC_DISTdirectory:bin/streamsets stagelibs -uninstall=<libraryID>,<libraryID>,...
Available stage libraries
A full Data Collector installation includes all of the following stage libraries. A core installation includes only some of the following stage libraries and typically requires you to install additional stage libraries. A common installation includes commonly-used stage libraries.
You can install additional stage libraries into either a core or common installation.
| Stage Library Name | Included Stages |
|---|---|
|
6.1 and later streamsets-datacollector-apache-kafka-lib |
For Kafka. Includes:
|
| streamsets-datacollector-apache-pulsar-lib | For Apache Pulsar versions 2.x. Includes:
|
| streamsets-datacollector-aws-lib | For Amazon Web Services 1.11.x. Includes:
|
| streamsets-datacollector-aws-secrets-manager-credentialstore-lib | For the AWS Secrets Manager credential store. |
| streamsets-datacollector-azure-keyvault-credentialstore-lib | For the Microsoft Azure Key Vault credential store. |
| streamsets-datacollector-azure-lib | For Microsoft Azure. Includes:
|
| streamsets-datacollector-basic-lib |
Includes the following origins:
Includes the following processors:
Includes the following destinations:
Includes the following executors:
|
| streamsets-datacollector-bigtable-lib | For Google Cloud Bigtable. Includes the Google Bigtable destination. |
| streamsets-datacollector-cassandra_3-lib | For Cassandra 1.2, 2.x, and 3.x. Includes the Cassandra destination. |
| streamsets-datacollector-cdp_7_3_1-lib | For Cloudera CDP 7.3.1. Includes:
Important: This stage library uses Cloudera Data Platform libraries that have known
security vulnerabilities. Consult your security team before installing this library.
|
| streamsets-datacollector-couchbase_3-lib | For Couchbase SDK 3.x. Includes:
|
| streamsets-datacollector-crypto-lib | For cryptography stages. Includes the Encrypt and Decrypt Fields processor. |
| streamsets-datacollector-cyberark-credentialstore-lib | For the CyberArk credential store. |
| streamsets-datacollector-dataformats-lib |
Contains parsers and generators for the data formats supported by Data Collector. |
| streamsets-datacollector-dev-lib | For developing and testing pipelines. Includes:
Note: Do not use these stages in production pipelines.
|
| streamsets-datacollector-elasticsearch_9-lib | For Elasticsearch 9.x. Includes the Elasticsearch origin and destination. |
| streamsets-datacollector-file-transfer-lib | For SFTP/FTP/FTPS. Includes the SFTP/FTP/FTPS Client origin, destination, and executor. |
| streamsets-datacollector-google-cloud-lib | For Google Cloud. Includes:
|
| streamsets-datacollector-google-secret-manager-credentialstore-lib | For the Google Secret Manager credential store. |
| streamsets-datacollector-groovy_2_4-lib | For Groovy version 2.4. Includes:
|
| streamsets-datacollector-groovy_4_0-lib | For Groovy version 4.0. Includes:
|
| streamsets-datacollector-hpe_edf_7_2-eep_9_2-lib | For HPE Ezmeral Data Fabric
7.2.x with EEP 9.2. Includes:
Important: This stage library uses HPE Ezmeral Data Fabric libraries that have known
security vulnerabilities. Consult your security team before installing this library.
|
| streamsets-datacollector-http-lib | For HTTP. Includes:
|
| streamsets-datacollector-ibm-connectivity-service-lib | For IBM Connectivity Service.
|
| streamsets-datacollector-influxdb_2_0-lib | For InfluxDB version 2.x. Includes the InfluxDB 2.x destination. |
| streamsets-datacollector-jdbc-branded-oracle-lib | For Oracle. Includes:
|
| streamsets-datacollector-jdbc-lib | For JDBC access to databases. Includes:
|
| streamsets-datacollector-jdbc-oracle-lib | For Oracle. Includes:
|
| streamsets-datacollector-jdbc-sap-hana-lib | For JDBC access to SAP HANA databases. Includes the SAP HANA Query Consumer origin. |
| streamsets-datacollector-jks-credentialstore-lib | For the Java keystore credential store. |
| streamsets-datacollector-jms-lib | For Java Messaging Services (JMS). Includes the JMS Consumer origin and JMS Producer destination. |
| streamsets-datacollector-jython_2_7-lib | For Jython version 2.7.x. Includes:
|
| streamsets-datacollector-kaitai-lib | For Kaitai Struct. Includes the Kaitai Struct Parser processor. |
| streamsets-datacollector-kinesis-lib | For Amazon Kinesis. Includes:
|
| streamsets-datacollector-mleap-lib | For MLeap. Includes the MLeap Evaluator processor. |
| streamsets-datacollector-mongodb-atlas-lib | For MongoDB Atlas and MongoDB Enterprise Server. Includes:
|
| streamsets-datacollector-mysql-binlog-lib | For MySQL binary logs. Includes the MySQL Binary Log origin. |
| streamsets-datacollector-orchestrator-lib | For the orchestration stages. Includes:
|
| streamsets-datacollector-postgres-aurora-lib | For Amazon Aurora PostgreSQL versions 1 through 4. Includes the Aurora PostgreSQL CDC Client origin. |
| streamsets-datacollector-rabbitmq-lib | For RabbitMQ version 3.5.6. Includes the RabbitMQ Consumer origin and RabbitMQ Producer destination. |
| streamsets-datacollector-redis-lib | For Redis versions 2.8 and 3.0. Includes:
|
| streamsets-datacollector-salesforce-lib |
For Salesforce. Includes:
|
| streamsets-datacollector-sdc-databricks-lib | For Databricks. Includes:
|
| streamsets-datacollector-sdc-snowflake-lib | For Snowflake. Includes:
|
| streamsets-datacollector-singlestore-lib | For SingleStore. Includes the SingleStore destination. |
| streamsets-datacollector-tensorflow-lib | For TensorFlow. Includes the TensorFlow Evaluator processor. |
| streamsets-datacollector-teradata-lib | For Teradata. Includes the Teradata destination. |
| streamsets-datacollector-thycotic-credentialstore-lib | For the Thycotic Secret Server credential store. |
| streamsets-datacollector-vault-credentialstore-lib | For the Hashicorp Vault credential store. |
| streamsets-datacollector-webclient-impl-okhttp | For OkHttp. Includes:
|
| streamsets-datacollector-wholefile-transformer-lib | Includes the Whole File Transformer processor. |
| streamsets-datacollector-windows-lib |
For Windows. Includes the Windows Event Log origin. |