Setting up external resources

StreamSets Data Collector engines can require access to external files and libraries, depending on how you design flows.

For example, JDBC stages require a JDBC driver to access the database. When you use a JDBC stage, you must make the driver available as an external resource.

To set up external resources, you generate an archive file in the TGZ or ZIP format that includes the external files and libraries. The archive file must use the required folder names and directory structure. You import the file as an asset in your project. Then, you select the imported file when you configure the StreamSets environment for the engine.

After you configure the external resource archive and restart the engine, the archive file contents are extracted and copied into the engine container.

When your flows require additional external resources, you download the archive file from your project, add the additional resources, and then upload the file again.

External resource types

Data Collector engines can require access to the following types of external resources:

External Resource Type Description
Runtime resource files Files that define flow property values that are called from within a flow. For more information, see Runtime resources.
External libraries External libraries required by flow stages. External libraries can include JDBC or JMS drivers or external Java libraries. For more information, see Install external libraries.
Custom stage libraries Stage libraries for custom stages. For example, you might develop a custom processor to perform custom processing in a flow. For more information, see Custom stage libraries.

Archive structure

An external resource archive file must use the required folder names and directory structure.

The root folder must be named externalResources and include the following directories:
resources
The resources directory must include text files created for runtime resources.
streamsets-libs-extras
The streamsets-libs-extras directory must include a subdirectory for each set of required external libraries based on the stage library name, as follows: <stage library name>/lib/
For example, external libraries used by stages included in the JMS stage library must be included in the following subdirectory: streamsets-datacollector-jms-lib/lib/
For a list of stage library names and the stages included in each library, see Common stage libraries.
user-libs
The user-libs directory must include a subdirectory for each custom stage.

If your flows do not use one of the external resource types, you can omit that directory. For example, if you have not developed custom stage libraries, you do not need to include the user-libs directory.

Sample

This sample external resource archive file includes a runtime resource file named JDBC.txt, the MySQL JDBC driver for stages included in the JDBC stage library, and the Oracle JDBC driver for the Oracle Bulkload source included in the JDBC Oracle stage library. It does not include any custom stage libraries:

externalResources
  resources
    JDBC.txt
  streamsets-libs-extras
    streamsets-datacollector-jdbc-lib
      lib
        mysql-connector-java-8.0.12.jar
    streamsets-datacollector-jdbc-oracle-lib
      lib
        ojdbc8-19.3.0.0.jar

Setting up an archive

Set up an external resource archive after you have finalized the list of external resources that your flows require.

Procedure

  1. Generate an archive file in the TGZ or ZIP format that includes all external resources required by your flows.

    Ensure that the file uses the required folder names and directory structure.

  2. Import the archive file as an asset in your project.
    1. On the Assets tab of your project, click Import assets.
    2. Click Local file, and then click Data asset.
    3. Add the archive file and click Done.
  3. Configure the StreamSets environment to use the imported archive file.
    1. On the Manage tab of your project, from the environment Options icon Options icon, click Edit environment.
    2. For the External resources property, select the imported archive file.
    3. Click Save.
  4. If the engine is running, restart the engine container for the changes to take effect.

Updating an archive

When an engine uses an external resource archive and your flows require additional resources, you download the archive file from your project, add the additional resources, and then upload the file again.

Procedure

  1. On the Assets tab of your project, locate the archive file asset and select Download from the overflow menu.
  2. Extract the downloaded file.
  3. Add the additional external resources to the required subfolder for the resource type.
  4. Compress the archive file in the TGZ or ZIP format.
  5. Import the updated archive file to your project.
    1. On the Assets tab of your project, click Import assets.
    2. Click Local file, and then click Data asset.
    3. Add the updated archive file.
    4. Choose to overwrite the existing file, and click Submit.
    5. Click Done.
  6. If an engine is running, restart the engine container for the changes to take effect.