How to Deploy IBM Automatic Data Lineage on an AWS EC2 Instance
Provisioning Requirements
See Manta Technical Requirements for more details.
-
Generally, the best practice is to select either the General Purpose or Compute Optimized family of instance types.
-
Choose a base image that is compatible with a Automatic Data Lineage-supported OS. (Note that Amazon Linux is also a supported OS.)
Amazon EC2 Security Groups
-
Ensure that the AWS Security group assigned to the EC2 instance has the proper rules assigned to allow traffic to move to and from the associated instance.
-
Verify that there is a rule that allows traffic to flow from your computer to port 22 for Linux instances (SSH) and port 3389 for Windows instances (3389).
DNS Configuration
-
It may be beneficial to configure an elastic load balancer to properly support DNS support.
-
This can also be achieved through the use of alternative products like NGINX.
Connecting to an AWS EC2 Instance and Migrating the Installer (Linux)
- It can be challenging to migrate the installer, dependencies, and required files to and from the Automatic Data Lineage host with a Linux-based OS. This is due to the “headless” or GUI-less nature of these OSs. Several ways of migrating the required files to and from the Automatic Data Lineage host are listed below.
Options:
-
SCP
-
Cloud Tool
-
AWS S3
-
FileZilla (not included in the documentation below)
1. SCP Command
-
In order to run an SCP command, you will need to obtain a private key in the appropriate format.
- See this article for more information: https://docs.aws.amazon.com/AWSEC2/latest/UserGuide/EC2-key-pairs.html#prepare-key-pair.
-
If you are using an SSH client on MacOS or a Linux computer, be sure to run the following command.
chmod 400 my-key-pair.pem
SCP Command to Copy Files to an EC2 Instance
scp -i /directory/to/abc.pem /your/local/file/to/copy user@EC2-xx-xx-xxx-xxx.compute-1.amazonaws.com:path/to/file
Note: The “user” argument specified in the command above is dependent on the base image of the host. See below for the specific value to assign to the argument.
Users from Amazon:
For Amazon Linux, the user name is EC2-user.
For RHEL, the user name is EC2-user or root.
For Ubuntu, the user name is ubuntu or root.
For Centos, the user name is centos.
For Fedora, the user name is EC2-user.
For SUSE, the user name is EC2-user or root.
Otherwise, if EC2-user and root don’t work, check with your AMI
provider.
2. Cloud Tool
- Cloud tool provides an alternative to the SCP command. Copy the information from the examples below.
Cloud-Tool Copy to an EC2 Instance Command
cloud-tool --profile <PROFILE> --region <REGION> copy-files --copy-to-EC2 --private-ip <PRIVATE IP ADDRESS> "~/mantaflow.zip" "~/manta.zip"\
Cloud-Tool Copy from an EC2 Instance Command
cloud-tool --profile <PROFILE> --region <REGION> copy-files --copy-to-EC2 --private-ip <PRIVATE IP ADDRESS> "~/mantaflow.zip" "~/manta.zip"\
3. AWS S3
-
Another option to migrate the installer to/from an EC2 instance is to upload the files to an AWS S3 bucket and configure them as a mount to the EC2 Instance.
- For more information, review https://aws.amazon.com/premiumsupport/knowledge-center/EC2-instance-access-s3-bucket/.
Troubleshooting External Traffic to Manta Web Applications
Linux:
- In some instances, once installation of Automatic Data Lineage on the EC2 Host is completed, you still may not be able to access Automatic Data Lineage through a web browser. In order to troubleshoot whether the issue is related to the Automatic Data Lineage application or the network/firewall/infrastructure, it can be helpful to use the CLI tools cURL or wGET. Running an associated command on the Linux host using one of these CLI tools can be a helpful diagnostic action. If the command runs successfully, it is an indication that the web services are healthy and the issue likely resides with the firewall or network provisioning.
Example of a wGet Command
wget https://localhost:8080/manta-dataflow-server/
Windows:
- Try to access the web applications using a web browser on the host. If you are able to launch the web applications on the host but are unable to access them in a web browser external to the host, the problem is likely related to the firewall/networking/infrastructure.