Setting up external resources
StreamSets Data Collector engines can require access to external files and libraries, depending on how you design flows.
For example, JDBC stages require a JDBC driver to access the database. When you use a JDBC stage, you must make the driver available as an external resource.
To set up external resources, you generate an archive file in the TGZ or ZIP format that includes the external files and libraries. The archive file must use the required folder names and directory structure. You import the file as an asset in your project. Then, you select the imported file when you configure the StreamSets environment for the engine.
After you configure the external resource archive and restart the engine, the archive file contents are extracted and copied into the engine container.
When your flows require additional external resources, you download the archive file from your project, add the additional resources, and then upload the file again.
External resource types
Data Collector engines can require access to the following types of external resources:
| External Resource Type | Description |
|---|---|
| Runtime resource files | Files that define flow property values that are called from within a flow. For more information, see Runtime resources. |
| External libraries | External libraries required by flow stages. External libraries can include JDBC or JMS drivers or external Java libraries. For more information, see Install external libraries. |
| Custom stage libraries | Stage libraries for custom stages. For example, you might develop a custom processor to perform custom processing in a flow. For more information, see Custom stage libraries. |
Archive structure
An external resource archive file must use the required folder names and directory structure.
- resources
- The resources directory must include text files created for runtime resources.
- streamsets-libs-extras
- The streamsets-libs-extras directory must include a subdirectory for each
set of required external libraries based on the stage library name, as follows:
<stage library name>/lib/ - user-libs
- The user-libs directory must include a subdirectory for each custom stage.
If your flows do not use one of the external resource types, you can omit that directory. For example, if you have not developed custom stage libraries, you do not need to include the user-libs directory.
Sample
This sample external resource archive file includes a runtime resource file named JDBC.txt, the MySQL JDBC driver for stages included in the JDBC stage library, and the Oracle JDBC driver for the Oracle Bulkload source included in the JDBC Oracle stage library. It does not include any custom stage libraries:
externalResources
resources
JDBC.txt
streamsets-libs-extras
streamsets-datacollector-jdbc-lib
lib
mysql-connector-java-8.0.12.jar
streamsets-datacollector-jdbc-oracle-lib
lib
ojdbc8-19.3.0.0.jar
Setting up an archive
Set up an external resource archive after you have finalized the list of external resources that your flows require.
Procedure
Updating an archive
When an engine uses an external resource archive and your flows require additional resources, you download the archive file from your project, add the additional resources, and then upload the file again.