Engine communication

Your browser communicates with the Data Collector engine either through secure tunneling or a direct connection, depending on your account settings. Tunneling requires no configuration, while direct communication requires additional setup.

You build and manage StreamSets flows in IBM watsonx.data integration, then run the flows as jobs on a Data Collector engine. Because jobs run within your corporate network, you retain full ownership and control of your data.

Some browser actions, such as building flows, require communication with the engine. The browser can connect with one of the following methods:

Tunneling (default)
By default, the browser uses tunneling to communicate with the engine.
With the tunneling method, watsonx.data integration acts as the proxy between the browser and the engine.
When you build flows, the browser initiates outbound connections to watsonx.data integration. Watsonx.data integration uses an encrypted tunnel to relay data to the engine. The data is decrypted and re-encrypted during transit but is not inspected or used by watsonx.data integration.
Tunneling is secure and requires no additional setup.
However, when you preview a flow, your source data passes through encrypted connections into watsonx.data integration and back to your browser.
Direct
When you build flows, the browser connects directly to the engine over HTTPS on the engine port number. When you preview a flow, your source data does not pass through watsonx.data integration.

The direct method requires additional setup.

Switching to direct communication

When you switch to the direct communication method, all Data Collector engines use direct communication with the browser. You cannot configure different communication methods for different projects or StreamSets environments.

About this task

Required permissions
You must have the Administrator role or the Manage configuration user permission.
Important: Delete all engines in your account before you change the communication method. After you make the change, retrieve the engine command from each StreamSets environment and run the command to re-create the engine containers.

Procedure

  1. Stop and remove all engine containers.
    1. Determine the container ID:

      <docker|podman> ps

      For example, use the following command for Docker:

      docker ps

    2. Copy the ID of the container that you want to stop and remove.
    3. Stop the engine:

      <docker|podman> stop <container_id>

    4. Remove the engine:

      <docker|podman> rm <container_id>

    5. Repeat for all engine containers.
  2. Delete the engines from all StreamSets environments in your account.
    1. On the Manage tab of your project, click the StreamSets tool.
    2. Click the environment name.
    3. Click Delete for each engine.
    4. Repeat for all StreamSets environments in all projects.
  3. Switch the communication method.
    1. Select Administration > Configurations and settings > StreamSets configuration.
    2. Select Direct for the browser to engine communication.
    3. Click Apply.
  4. Configure network routes and firewalls to allow outbound connections from browser workstations to the engine workstations on the HTTPS port number.

    By default, the browser uses a self-signed SSL/TLS certificate to communicate with the engines. To use more secure communication, a user with the Editor or Admin role in the project can create a keystore file and edit the StreamSets environment to use the file. For more information, see Enabling HTTPS host verification.

  5. Retrieve the engine command from each StreamSets environment and run the command to re-create the engine containers.