User-Provided Stage Library Mode
For advanced use cases, you can configure a Data Collector or Transformer deployment to use the user-provided stage library mode where you provide the stage library files during the engine installation. For example, you might use the user-provided stage library mode if your organization requires that the stage library files be scanned for security purposes before they are installed on the engine machines.
Before you can use the user-provided stage library mode, you must complete the prerequisite tasks.
Prerequisites
- Display the Stage Library Mode Property
- Download the stage libraries, based on the offering that you use:
Display the Stage Library Mode Property
By default, deployments use the managed stage library mode.
Before you can use the user-provided stage library mode, a user with the Organization Administrator role must modify the organization configuration properties to display the Stage Library Mode property for deployments.
Download the Stage Libraries for IBM StreamSets as a Service
Applies to: IBM StreamSets as a Service
Download the stage library files that you want to install on engines.
Download the Stage Libraries for IBM StreamSets as Client-Managed Software
Applies to: IBM StreamSets as client-managed software
Download the stage library files that you want to install on engines.
Provide Files for Self-Managed Deployments
To provide stage library files for a self-managed deployment, in the Configure Engine step of the deployment wizard, select User-Provided for the Stage Library Mode property.
When you launch engines for the deployment, the streamsets-libs
directory in the engine installation contains a few default stage libraries. Copy the
downloaded stage library files into the directory, and then restart the engine.
For example, for a Data Collector
5.10.0 tarball, copy the downloaded stage library files into the
/streamsets-datacollector-5.10.0/streamsets-libs
directory and then
restart the engine.
For a Docker image installation, you can provide the files to the engine by editing the running container. For example, for a Data Collector 5.10.0 Docker image, you can start a Bash shell in the running Docker container, copy the downloaded stage library files into the /opt/streamsets-datacollector-5.10.0/streamsets-libs directory, and then restart the engine.
Alternatively, you can configure the Docker image to mount an external directory containing the downloaded stage library files, or you can create a custom Docker image derived from an IBM StreamSets engine image that includes the downloaded stage library files.
Provide Files for Cloud Service Provider Deployments
To provide stage library files for cloud service provider deployments, such as Amazon EC2, Azure VM, or GCE deployments, in the Configure Engine step of the deployment wizard, select User-Provided for the Stage Library Mode property.
Then in the Configure Autoscaling Group step of the deployment wizard, define the Init Script property to include commands that copy the downloaded stage library files into the streamsets-libs folder in the engine installation.
#!/bin/bash
wget -q https://<web_server>.com/streamsets-datacollector-aws-lib-5.10.0.tgz -P /tmp/
tar -zxf /tmp/streamsets-datacollector-aws-lib-5.10.0.tgz -C /opt/streamsets-datacollector/streamsets-libs/ --strip-components=2
When you start the deployment, the initialization script copies the files into the engine installation on each provisioned instance in your cloud account.
Provide Files for Kubernetes Deployments
To provide stage library files for a Kubernetes deployment, in the Configure Engine step of the deployment wizard, select User-Provided for the Stage Library Mode property.
Then in the Configure Kubernetes Deployment step of the deployment wizard, use advanced mode to directly edit the deployment YAML file such that the downloaded stage library files are copied into the streamsets-libs folder in the engine installation.
For example, you might create a static persistent volume in Kubernetes with the downloaded stage library directories and a persistent volume claim. For details on Kubernetes persistent volumes, see the Kubernetes documentation.
spec/template/spec/containers[0]
section:volumeMounts:
- mountPath: /opt/streamsets-datacollector-<version>/streamsets-libs
name: stagelibs
readOnly: true
subPath: streamsets-libs
spec/template/spec
section:volumes:
- name: stagelibs
persistentVolumeClaim:
claimName: stage-libs-claim
readOnly: true
When you start the deployment, the stage library files are mounted into the engine installation on each Kubernetes pod.
Alternatively, you can create a custom Docker image derived from an IBM StreamSets engine image that includes the downloaded stage library files, and then use advanced mode to configure the deployment YAML file to use the custom image.