Enabling HTTPS

The Data Collector engine uses direct engine REST APIs and HTTPS for secure communication. By default, inbound connections to the Data Collector engine use HTTPS on port 18630. The engine provides a self-signed SSL/TLS certificate so you can start building flows immediately.

However, using the self-signed certificate is not recommended for production or long-term development.

To enable more secure communication, complete the following tasks:
  1. Prerequisite tasks
  2. Create a keystore file
  3. Configure engines to use the keystore file
  4. Mount the keystore file

For information about outbound security, see SSL/TLS encryption.

Prerequisite tasks

To enable more secure engine communication, complete the following prerequisite tasks:

Obtain access to OpenSSL and Java keytool
If you do not have a keystore file that includes an SSL/TLS certificate signed by a certificate authority (CA), you can request a certificate and create the keystore file using the following tools:
  • OpenSSL - Use OpenSSL to create a Certificate Signing Request (CSR) that you send to the CA of your choice, as well as to create the keystore files. For more information, see the OpenSSL documentation.
  • Java keytool - You can also use Java keytool to create a CSR and to create the keystore files. Java keytool is part of the Java Development Kit (JDK). For more information, see the keytool documentation.
Generate the SSL/TLS certificate and private key pairs signed by a certificate authority (CA)
Generate a single private key and public certificate pair for the engine. The engine uses a self-signed certificate by default. However, web browsers generally issue a warning for self-signed certificates. As a best practice, generate a key and certificate pair signed by a trusted CA.
Important: The signed certificate must include the fully qualified domain name (FQDN) for the engine machine.
To obtain a certificate from a trusted CA, you must provide proof that you are the owner of the domain name for which you are requesting the certificate. Use OpenSSL or keytool to generate a key pair and then submit a Certificate Signing Request (CSR) to the CA. The exact procedure depends on the CA that you choose to use. For more information, see the documentation provided by the CA.

Step 1. Create a keystore file

About this task

Though the Data Collector engine uses a self-signed SSL/TLS certificate by default, create a custom keystore file for a higher level of security.

Create a keystore file that includes each private key and public certificate pair signed by the certificate authority (CA). A keystore is used to verify the identity of the client upon a request from an SSL/TLS server.

Create all keystores in the PKCS #12 (p12 file) format. In most cases, a CA issues certificates in PEM format. Use OpenSSL to directly import the certificate into a PKCS #12 keystore.

Procedure

  1. Use the following command to import the certificate and private key issued in PEM format to a PKCS #12 keystore:
    openssl pkcs12 -export -in <PEM_certificate> -inkey <private_key> -out <keystore_file_name> -name <keystore_name> 
    For example:
    openssl pkcs12 -export -in my_certificate.pem -inkey my_private_key.key -out my_keystore.p12 -name my_keystore

    You will be prompted to create a password for the keystore file.

  2. Store the keystore password in a text file.
    For example, store the password in a file named my_keystore_password.txt.
    Tip: To ensure that a newline character is not added after the password, run the following command:
    echo -n "<password>" > my_keystore_password.txt
  3. Store the keystore and password text files in a local directory, such as ${HOME}/sdc.
    To enable the engine to access these files, you will add a bind mount argument to the engine run command in a later step.

Step 2. Configure engines to use the keystore file

About this task

To enable the engine to use the keystore and keystore password text files that you created, add advanced configuration properties to the StreamSets environment for the engine.

Procedure

  1. On the Manage tab of your project, edit the StreamSets environment. Open the Advanced Configuration dialog box, then add the following Data Collector properties as keys and values, as needed:
    Advanced Configuration HTTPS Property Description
    https.keystore.path

    Name of the target keystore file to add to the Data Collector engine container.

    You can specify the name of your keystore file, or you can skip configuring this property to use the default file name: keystore.jks.

    For example, you have a keystore file named my_keystore.p12. To use this name for the target keystore file in the Data Collector engine container, configure this property as follows:

    • Key: https.keystore.path

    • Value: my_keystore.p12

    If you do not configure this property, when you mount your keystore file, use keystore.jks as the target keystore file name.

    https.keystore.password Password to open the keystore file.

    Instead of entering the password in clear text, you can store the password in a keystore password file, then use the file or exec function to retrieve the password from the file.

    Default is ${file("keystore-password.txt")}.

    You can specify the name of your keystore password file in the default expression, or you can skip configuring this property to use the default keystore password file name: keystore-password.txt.

    For example, you have a keystore password file named my_keystore_password.txt. To use this name for the target keystore password file in the Data Collector engine container, configure this property as follows:

    • Key: https.keystore.password
    • Value: ${file("my_keystore_password.txt")}

    If you do not configure this property, when you mount your keystore password file, use keystore-password.txt as the target keystore password file name.

    https.port Secure port number for the engine. Default is 18630. Update the port number as needed.
    https.require.hsts Requires the engine to include the HTTP Strict Transport Security (HSTS) response header.

    Set to true to enable HSTS.

    Default is false.

  2. Click Save to save your additions to the Advanced Configuration dialog box.
  3. Click Save to save all changes to the environment.

Step 3. Mount the keystore file

About this task

To enable Data Collector to use the keystore and keystore password files, edit the StreamSets environment to customize the engine run command. Add a mount option to the command, then run the customized command to restart the engine.

Procedure

  1. If the engine is running, stop the engine.
    1. Determine the container ID for the engine:
      <docker|podman> ps

      For example, use the following command for Docker: docker ps

    2. Copy the ID of the container that you want to update.
    3. Stop the engine:
      <docker|podman> stop <container_id>
  2. On the Manage tab of your project, click the StreamSets tool.
  3. For the environment, click Options > Edit environment.
  4. In the Advanced configurations section, click Click to configure.
  5. In the Docker command options section, click Add value twice.
  6. Add the following bind mount options as separate values:
    --mount type=bind,source=<path_to_keystore_file>,target=/etc/sdc/<keystore_file_name>,readonly
    --mount type=bind,source=<path_to_keystore_password_file>,target=/etc/sdc/<keystore_password_file_name>,readonly

    Or if you do not define the https.keystore.path and https.keystore.password properties, add the following options and update the source paths only:

    --mount type=bind,source=<path_to_keystore_file>,target=/etc/sdc/keystore.jks,readonly
    --mount type=bind,source=<path_to_keystore_password_file>,target=/etc/sdc/keystore-password.txt,readonly
  7. Define the following source and target paths in the options:
    • <path_to_keystore_file> is the local location of keystore file that you created in step 1.
    • <path_to_keystore_password_file> is the local location of the keystore password file that you created in step 1.
    • <keystore_file_name> is the file name that you defined for the https.keystore.path property in step 2.
    • <keystore_password_file_name> is the file name that you defined for the https.keystore.password property in step 2.
    Important: Make sure that all files exist in the expected locations.
    For example, the following bind mount options define "${HOME}"/sdc/my_keystore.p12 and "${HOME}"/sdc/my_keystore_password.txt for the source paths and use my_keystore.p12 and my_keystore_password.txt as the file names in the target paths:
    --mount type=bind,source="${HOME}"/sdc/my_keystore.p12,target=/etc/sdc/my_keystore.p12,readonly
    --mount type=bind,source="${HOME}"/sdc/my_keystore_password.txt,target=/etc/sdc/my_keystore_password.txt,readonly
  8. Save your changes.
  9. For the environment, click Options > Get run command, and then copy the command.

    Notice that the copied command includes your customization.

  10. Run the customized engine command.

    The command starts a Data Collector engine container that includes the specified HTTP properties and mounted files.