Self-Managed Deployments
You can create a self-managed deployment for an active self-managed environment.
When using a self-managed deployment, you take full control of procuring the resources needed to run engine instances. The resources can be local on-premises machines or cloud computing machines. You must set up the machines and complete the installation prerequisites required by the IBM StreamSets engine type. When the machines reside behind a firewall, you must allow the required inbound and outbound traffic to each machine, as described in Firewall Configuration.
When you create a self-managed deployment, you define the engine type, version, and configuration to deploy.
After you create and start a self-managed deployment, Control Hub displays the engine installation script that you run to install and launch engine instances on the on-premises or cloud computing machines that you have set up. You can configure the installation script to run engine instances as a foreground or background process. You also select the installation type to use - either a Docker image or a tarball file.
Using a self-managed deployment is the simplest way to get started with IBM StreamSets. After getting started, you might consider using one of the cloud service provider integrations, such as the AWS and GCP environments and deployments. With these integrations, Control Hub automatically provisions the resources needed to run the engine type in your cloud service provider account, and then deploys engine instances to those resources.
Quick Start Deployment
When you deploy an engine as you build your first Data Collector pipeline, Control Hub presents a simplified process to help you quickly deploy your first Data Collector engine.
Control Hub
creates a self-managed deployment for you and names the deployment Data
Collector 1 (Quick Start). Control Hub
also assigns a quick-start tag and a
datacollector1quickstart
engine label to the deployment.
You can rename the quick start deployment, remove the default tag or engine label, or edit the deployment just as you edit any other self-managed deployment.
Configuring a Self-Managed Deployment
Configure a self-managed deployment to define the group of engine instances to deploy to a self-managed environment.
To create a new deployment, click Create
Deployment icon: .
To edit an existing deployment, click Edit.
in the Navigation panel, click the deployment name, and then clickDefine the Deployment
Define the deployment essentials, including the deployment name and type, the environment that the deployment belongs to, and the engine type and version to deploy.
Once saved, you cannot change the deployment type, the engine version, or the environment.
Configure the Engine
Define the configuration of the engine to deploy. You can use the defaults to get started.
Share the Deployment
By default, the deployment can only be seen by you. Share the deployment with other users and groups to grant them access to it.
Review and Launch the Engine
You've successfully finished creating the deployment.
Foreground or Background Process
- Foreground
- When the installation script runs an engine instance as a foreground process, you cannot run additional commands from that command prompt while the engine runs. The command prompt must remain open for the engine to continue to run. If you close the command prompt, the engine shuts down.
- Background
- When the installation script runs an engine instance as a background process, you regain access to the command prompt after the engine starts. You can run additional commands from that command prompt as the engine runs. If you close the command prompt, the engine continues to run.
By default, a tarball installation script runs an engine instance as a foreground process. A Docker installation script runs an engine instance as a background process.
Launching an Engine for a Deployment
After creating a self-managed deployment, you set up a machine that meets the engine requirements. The machine can be a local on-premises machine or a cloud computing machine. Then, you manually run the engine installation script to install and launch an engine instance on the machine.
When the machine resides behind a firewall, you also must allow the required inbound and outbound traffic to each machine, as described in Firewall Configuration.
Launching a Data Collector Docker Image
Complete the following steps on the machine where you want to launch the Data Collector Docker image.
Launching a Data Collector Tarball
Complete the following steps on the machine where you want to install and launch the Data Collector tarball.
Launching Transformer when Spark Runs Locally
To get started with Transformer, you can use a local Spark installation that runs on the same machine as Transformer.
This allows you to easily develop and test local pipelines, which run on the local Spark installation.
Launching a Transformer Docker Image
To use a Transformer Docker image when Spark runs locally, complete the following steps on the machine where you want to launch Transformer. The Docker image includes a local Spark installation that matches the Scala version selected for the engine version.
Launching a Transformer Tarball
To use a Transformer tarball when Spark runs locally, complete the following steps on the machine where you want to install and launch Transformer.
Launching Transformer when Spark Runs on a Cluster
In a production environment, use a Spark installation that runs on a cluster to leverage the performance and scale that Spark offers.
Install Transformer on a machine that is configured to submit Spark jobs to the cluster. When you run Transformer pipelines, Spark distributes the processing across nodes in the cluster.
For information about each cluster type, see Cluster Types in the Transformer engine documentation.
Launching a Transformer Docker Image
To use a Transformer Docker image when Spark runs on a cluster, complete the following steps on the machine where you want to launch Transformer.
Launching a Transformer Tarball
To use a Transformer tarball when Spark runs on a cluster, complete the following steps on the machine where you want to launch Transformer.
Launching a Transformer for Snowflake Docker Image
Complete the following steps on the machine where you want to launch the Transformer for Snowflake Docker image.
Applicable when your organization uses a deployed Transformer for Snowflake engine.
Launching a Transformer for Snowflake Tarball
Complete the following steps on the machine where you want to install and launch the Transformer for Snowflake tarball.
Applicable when your organization uses a deployed Transformer for Snowflake engine.
Retrieving the Installation Script
You can retrieve the installation script generated for a self-managed deployment.
Running the Installation Script without Prompts
When you run the engine installation script for a tarball installation, you must respond to command prompts to enter download and installation directories. To skip the prompts, you can optionally define the directories as command arguments.
You might skip the command prompts if you set up an automation tool such as Ansible to install and launch engines. Or you might skip the prompts if you prefer to define the directories at the same time that you run the command.
To skip the prompts, include the following arguments in the installation script command:
Argument | Value |
---|---|
--no-prompt |
None. Indicates that the script should run without prompts. |
--download-dir |
Enter the full path to an existing download directory. |
--install-dir |
Enter the full path to an existing installation directory. |
bash -c "$(curl -fsSL https://na01.hub.streamsets.com/streamsets-engine-install.sh)" --deployment-id=<deployment_ID> --deployment-token=<deployment_token> --sch-url=https://na01.hub.streamsets.com --no-prompt --download-dir=/tmp/streamsets --install-dir=/opt/streamsets-datacollector
Increasing the Engine Timeout for the Installation Script
Step 2 of 4: Waiting up to 5 minutes for engine to respond on http://<host name>:<port>
Step 2 of 4 failed: Timed out while waiting for engine to respond on http://<host name>:<port>
When you encounter this error, run the installation script again using the
STREAMSETS_ENGINE_TIMEOUT_MINS
environment variable to increase the
engine timeout value.
For example, to set an eight minute timeout for a tarball installation, add the environment variable to the installation script as follows:
STREAMSETS_ENGINE_TIMEOUT_MINS=8 bash -c "$(curl -fsSL https://na01.hub.streamsets.com/streamsets-engine-install.sh)" --deployment-id=<deployment_ID> --deployment-token=<deployment_token> --sch-url=https://na01.hub.streamsets.com
docker run -d -e STREAMSETS_ENGINE_TIMEOUT_MINS=8 -e STREAMSETS_DEPLOYMENT_SCH_URL=https://na01.hub.streamsets.com -e STREAMSETS_DEPLOYMENT_ID=<deployment_ID> -e STREAMSETS_DEPLOYMENT_TOKEN=<deployment_token> streamsets/datacollector:5.5.0