Engine Communication
You access Control Hub using a web browser. You use Control Hub to deploy engines to your corporate network, which can be on-premises or on a protected cloud computing platform.
Control Hub works with the deployed engines when you design pipelines and when you run pipelines from jobs.
- Control Hub
- Deployed engines use encrypted REST APIs to communicate with Control Hub. Engines initiate outbound connections to Control Hub over HTTPS on port number 443.
- Web browser
- The web browser also uses encrypted REST APIs to communicate with Control Hub, initiating outbound connections to Control Hub over HTTPS on port number 443.
WebSocket Tunneling
By default, the web browser uses WebSocket tunneling to communicate with deployed engines.
When an engine starts up, the engine uses the WebSocket Secure (wss) protocol to establish a WebSocket tunnel with Control Hub over an encrypted SSL/TLS connection. Control Hub serves as the WebSocket server, and acts as an intermediary between the browser and the engine.
When you design pipelines or monitor jobs with WebSocket tunneling enabled, the web browser initiates outbound connections to Control Hub over HTTPS on port number 443. Control Hub then uses the encrypted WebSocket tunnel to communicate with the engine. The engine securely passes the requested data back through the WebSocket tunnel to Control Hub, and then the browser receives the data from Control Hub over HTTPS. Control Hub decrypts and then re-encrypts the data as it passes through. Control Hub does not use or inspect the data.
Each engine uses a single WebSocket tunnel connection that remains active until the engine restarts. Multiple users can use the same connection to securely request data from the engine. WebSocket tunneling ensures that your data is secure and does not require additional setup.
However, when you preview a pipeline or capture a snapshot of an active job, your source data does pass through encrypted connections beyond your corporate network into Control Hub, and then back to your web browser. If your data must remain behind a firewall due to corporate regulations, you can configure the browser to use direct engine REST APIs to directly communicate with the engines behind the firewall.
The following image shows how the web browser uses a WebSocket tunnel to communicate with engines:
Direct Engine REST APIs
When your source data must remain behind a firewall due to corporate regulations, you can configure the web browser to use direct engine REST APIs to communicate with engines deployed behind the firewall.
To use direct engine REST APIs, complete the following tasks:
- Enable engines to use the HTTPS protocol.
- Ensure browser access to the engines.
- Choose the direct engine REST APIs communication method in your browser settings.
- Optionally, require all users to use direct engine REST APIs.
The following image shows how the web browser can use direct engine REST APIs to communicate with engines:
Enabling HTTPS for Engines
To use direct engine REST APIs, you must enable engines to use the HTTPS protocol.
Prerequisites
Before you enable HTTPS for an engine, complete the following requirements:
- Obtain access to OpenSSL and Java keytool
- If you do not have a keystore file that includes an SSL/TLS certificate signed
by a certificate authority (CA), you can request a certificate and create the
keystore file using the following tools:
- OpenSSL - Use OpenSSL to create a Certificate Signing Request (CSR) that you send to the CA of your choice, as well as to create the keystore and truststore files. For more information, see the OpenSSL documentation.
- Java keytool - You can also use Java keytool to create a CSR and to create the keystore and truststore files. Java keytool is part of the Java Development Kit (JDK). For more information, see the keytool documentation.
- Generate SSL/TLS certificate and private key pairs signed by a certificate authority (CA)
- To enable HTTPS for an engine, generate a single private key and public certificate pair for the engine. IBM StreamSets provides a self-signed certificate that you can use. However, web browsers generally issue a warning for self-signed certificates. As a best practice, generate a key and certificate pair signed by a trusted CA.
Create a Keystore File
Create a keystore file that includes each private key and public certificate pair signed by the CA. A keystore is used to verify the identity of the client upon a request from an SSL/TLS server.
Create keystores in the PKCS #12 (p12 file) format. In most cases, a CA issues certificates in PEM format. Use OpenSSL to directly import the certificate into a PKCS #12 keystore.
Create a Truststore File (Transformer Only)
Transformer uses the default Java truststore file located in $JAVA_HOME/jre/lib/security/cacerts. When Transformer is enabled for HTTPS and you run a cluster pipeline that launches a Spark application, the default Java truststore file is included with the application. When the Spark application sends status and metrics about running pipelines to Transformer, the HTTPS certificates must be trusted by the default Java truststore.
When Transformer runs pipelines on a Spark cluster and the Transformer HTTPS certificates are signed by a private CA or not trusted by the default Java truststore, you must create a custom truststore file or modify a copy of the default Java truststore file. For example, if your organization generates its own certificates, you must add the root and intermediate certificates for your organization to the truststore file.
- Transformer runs only local pipelines.
- Transformer runs pipelines on a Spark cluster and your certificates are signed by a trusted CA included in the default Java truststore file.
These steps show how to modify a copy of the default truststore file to add an additional CA to the list of trusted CAs. If you prefer to create a custom truststore file, see the keytool documentation.
- Java keystore file (JKS)
- PKCS #12 (p12 file)
Configure Engines to Use HTTPS
Modify engine configuration properties to configure the engine to use a secure port, your keystore file, and optionally your truststore file.
Ensuring Browser Access to Engines
To use direct engine REST APIs, you must ensure that the browser can reach the URLs of the engines.
Configure network routes and firewalls so that Control Hub web browsers can reach all engines on the configured HTTPS port number. For more information about inbound traffic to engines, see Firewall Configuration.
To verify that the browser can access the engines, view the engines from the Engines view or from the deployment details on the Deployments view. When the engine is accessible, the Last Reported Time value is listed in green. When the engine cannot be reached, the Last Reported Time value is red.
Choosing the Communication Method
After you enable HTTPS for the engines and ensure that the browser can access the engines, you choose the communication method that the browser uses.
By default, the browser uses WebSocket tunneling. You might choose direct engine REST APIs because the REST APIs can offer faster communication with the engines.
Requiring Direct Engine REST APIs
An organization administrator can optionally require that all web browsers use direct engine REST APIs to communicate with the engines.