Caddy proxy
Caddy is a reverse proxy server that simplifies setup of HTTPS and firewall rules. After enabling Caddy navigate to
https://{hostname}:8888/, there you will be automatically redirected to Manta Launcher.
Introduction
Caddy (see caddyserver.com for more details) is a new optional component of IBM Automatic Data Lineage that simplifies the setup of HTTPS. You can select the Caddy integration during the installation.
Purpose
HTTPS is a more secure mode of communication over network. As such, it offers encryption to protect the data exchanged between involved parties.
Caddy is placed between the user and the Automatic Data Lineage components acting as network gateway aggregating and routing all traffic between the user’s machine and individual Automatic Data Lineage components. The secure connection from the user is terminated at Caddy and the rest of the components within Automatic Data Lineage installation can be setup without HTTPS. As Caddy is the only public facing service, this setup is still completely secure. This network architecture has added benefit, because Caddy is the only service that needs to be exposed publicly, only its port must be enabled in firewall.
Networking diagrams
Network setup without Caddy
Networking diagram with Caddy
Use case scenarios
Review your overal network configuration to see if and how Caddy should be enabled or not.
|
Automatic Data Lineage installed with HTTP only. Proxy or loadbalancer is not used. |
Enable Caddy and use the HTTPS communication provided by Caddy. |
|
Automatic Data Lineage installed with HTTPS. Proxy or loadbalancer is not used. |
Disable all HTTPS setting in the individual components. Enable Caddy and use the HTTPS communication provided by Caddy. |
|
Automatic Data Lineage installed with HTTP only. Proxy or loadbalancer is used. |
Leave Caddy disabled. Caddy cannot be used together with another proxy or loadbalancer. |
Configuration & setup
Ports
During installation user must configure two new port to use with Caddy. Caddy itself requires two ports, one communication port to handle all the traffic, and second port for administration API. Only the communication port must be publicly available. The administration API of Caddy server, must be protected by suitable firewall rules. Automatic Data Lineage recommends the port numbers 8888 (communication port) and 8889 (administration API port) as the defaults. If you choose port numbers higher than 1024, then Caddy can and should run without root privileges (ports below 1024, are restricted on Linux).
HTTPS
Caddy has HTTPS management enabled out of the box. This means that even without any user intervention all traffic is protected. Caddy achieves this by generating its own Certificate Authority (certificate can be found in /data/local/pki/ca.crt)
which then signs the actual HTTPS certificated deployed.
To deploy custom certificates, reference them in the Caddy configuration file. In ${mantaflow}/caddy/conf/Caddyfile find a line starting with
# tls <cert file> <key file>
Remove the hash mark at the start, and replace the placeholder with absolute paths to the desired HTTPS certificate and key files. Place the certificate files outside the mantaflow directory, so they are preserved during potential upgrade. On Linux
installations, the files has to have same owner and user group as the rest of the Automatic Data Lineage installation. This way Caddy will be able to access them correctly. Starting in R42 use the dedicated configuration directory ${installdir}/conf.
If you already have deployed HTTPS keys in the past, you can easily convert the keystore to the format required by Caddy. Caddy is not a Java tool, so using the Java keystore directly is not possible, Caddy requires that the key and certificate
are provided in the PEM format. Key file in PEM format is a text file, that starts with following header: -----BEGIN PRIVATE KEY----- the PEM certificate is also a text file, but the header is following -----BEGIN CERTIFICATE-----
To convert the existing JKS (Java KeyStore) to the required PEM format, you will need the openssl command line tool. On Linux installations this is readily available, for Windows you have to download and install it separately.
-
Convert the JKS keystore to portable PKCS12 format:
keytool -importkeystore -srckeystore keystore.jks -destkeystore keystore.p12 -srcstoretype jks -deststoretype pkcs12 -
Export the certificate from the keystore:
openssl pkcs12 -in keystore.p12 -out newfile.crt.pem -clcerts -nokeys -
Export the private key from the keystore:
openssl pkcs12 -in keystore.p12 -out newfile.key.pem -nocerts -nodes
The resulting files are then referenced in Caddyfile.
To verify the certificate has been installed correctly launch Caddy using its bin/startup.[sh|bat] script, and open in browser the URL
https://localhost:8888/health. Adjust the port number according to your configuration. In the browser you can easily inspect the certificate information.
Caddy configuration
This is a simplified version of the Caddy server integration into the Automatic Data Lineage product. For detailed configuration of Caddy itself consult the official documentation on caddyserver.com.
Simplified configuration
Caddy uses the global configuration file
${installdir}/conf/manta.properties . In this file you can also configure the Caddy ports. The values from this configuration file are inject into the actual Caddy configuration file, located in
conf/Caddyfile. This configuration file contains the actual configuration documented on
https://caddyserver.com/docs/caddyfile. This file uses placeholders that are replaced from variables setup in the environment. Those variables are set in setenv_manta.{sh|bat} script, and injected into the environment via the startup script.
Integration into Automatic Data Lineage
Caddy is part of every installation. By default, the Caddy will only be configured, but it will not be enabled.
In following example replace the placeholder ${installdir} with absolute path to the Manta installation directory. Usually
/opt/mantaflow or C:\mantaflow depending on the OS.
Enabling
-
Shutdown Automatic Data Lineage using Manta Launcher
-
Navigate to
${installdir}/utility/ -
Run command
java -jar manta-installer-dep-caddy.jar -m ENABLE -r ${installdir} -
Start Automatic Data Lineage using Launcher
Caddy is integrated into the Manta Launcher, and you can see its status in the Launcher dashboard.
Disabling
-
Shutdown Automatic Data Lineageusing Manta Launcher
-
Navigate to
${installdir}/utility/ -
Run command
java -jar manta-installer-dep-caddy.jar -m DISABLE -r ${installdir} -
Start Automatic Data Lineage using Manta Launcher
Caddy is integrated into the Manta Launcher, and you can see its status in the Launcher dashboard.
Caddy accessible domains
As of R42.4, it is possible to specify which domains should Caddy be listening on and responding to. By default, Caddy listens on the system hostname provided during installation (manta.system.hostname property in <installDir>/conf/manta.properties)
and on hostname parsed from
manta.keycloak.public.url (provided during installation, can be found in <installDir>/conf/manta.properties). Additional domains can be specified using the manta.caddy.hostnames property in
<installDir>/conf/manta.properties.
If there are any changes made in the Caddy configuration, it is enough to restart just the Caddy. You do not have to restart all Automatic Data Lineage components. This is true for example when deploying new HTTPS certificates which means that the certificates can be changed even in the middle of a long-running scan. Automatic Data Lineage services are not available to the users when Caddy is not running. While Caddy supports zero downtime configuration changes, this feature is not supported by Automatic Data Lineage yet.
Network integration
All the other HTTP ports previously required by Automatic Data Lineage can be disabled in the firewall. Ensure that the Artemis port (default 61616) is still enabled in the firewall. Artemis is not HTTP based service, and as such it cannot leverage the HTTPS features provided by Caddy. The Artemis communication with Agent is secured using mTLS mechanism. This is already setup automatically.
If Caddy is disabled, all the ports need to be opened again, to allow Automatic Data Lineage to function correctly.
End-Users Browser Bookmarks
When Caddy is enabled all the traffic is directed through it. This means that all the services are available on new URLs. Caddy integration is setup in such a way that only the port and protocol have been changed. For example if you have previously
accessed Manta Flow Server on the URL
http://localhost:8080/manta-flow-viewer then the new URL will be https://localhost:8888/manta-flow-viewer, i.e. the protocol has been changed from http:// to https:// and the port has been changed
from 8080 to 8888 (this assumes the default ports are configured, in case you are using different port numbers or host name, change the URLs accordingly).
Using Caddy to hide port numbers from URLs
If default port numbers are used all the URLs in the Automatic Data Lineage will have the port number attached e.g.
https://localhost:8888/manta-flow-viewer. The port number is required to be available, the only exceptions are the ports 80 (plain HTTP) and 443 (HTTPS). Those ports are not required to be part of the URL.
To run the Caddy on the port 443, following changes are required in the Manta configuration file ${installdir}/conf/manta.properties:
| Default property value | Updated property value |
|---|---|
| manta.caddy.port=8888 | manta.caddy.port=443 |
manta.keycloak.public.url=https://localhost:8888/auth |
manta.keycloak.public.url=https://localhost/auth |
manta.launcher.url=https://localhost:8888 |
manta.launcher.url=https://localhost |
Working around the port number restrictions on Linux
As explained above, on Linux OS the port 443 is restricted, and only user with root privileges is allowed to use it. Given the architecture of Automatic Data Lineage, this means that the Launcher has to be started under root account.
This means that the whole application is running as root. From security perspective, this approach is not preferred. Strictly only Caddy needs to run with root privileges, and the rest of the components can be running
under more restricted user.
Option #1
Run Manta Launcher with root. Easiest solution, but least secure.
Option #2
Edit each startup.sh script, for each component except Caddy and Launcher, and add following line:if [ $UID -eq 0 ]; then exec runuser -u $USER "$0" -- "$@"; fi
like this:
#!/bin/bash
if [ $UID -eq 0 ]; then exec runuser -u $USER "$0" -- "$@"; fi
echo "Starting Manta Keycloak"
SCRIPT_DIR=$(dirname "$(readlink -f "$0")")
Replace $USER with actual username under which is the component supposed to run.
Then start Manta Launcher as a root.
This way only Caddy and Launcher will be running under root user, the rest of the product will run under the selected user.
Safer than #1 and more convenient, but Manta Launcher still runs as root.
Option #3
Edit the file
$installdir/launcher/manta-launcher-dir/conf/applications.json and change the Keycloak dependencies entry. Switch the "caddy" dependency from STARTUP to RUNTIME.
{
"id": "keycloak",
"name": "Keycloak",
...
"dependencies": [
{
"application": "launcher",
"type": "STARTUP"
},
{
"application": "caddy",
"type": "RUNTIME" <---------- WAS STARTUP, CHANGE TO RUNTIME
}
],
...
Start Launcher as a regular user. This way the Launcher will try to start Caddy which will fail. This is expected, as the regular user cannot use the port 443. Keep Manta Launcher running and start Caddy manually under the root user like this:sudo $installdir/caddy/bin/startup.sh
Safest option, only Caddy runs under root user. Least convenient method, as the Caddy server has to be started manually, without the help of Manta Launcher.