Manta Flow Agent Configuration for Extraction
Overview
Learn how to use an agent to extract metadata from the server. Manta Flow Agent is an on-premises agent that extracts source systems which aren’t available from outside the network and sends the extracts to Manta Flow Cloud for data lineage analysis.
There are several options where to deploy Manta Agent. The following list summarizes the most common options:
-
Separate VM for Agent only within your on-prem environment, especially useful for cases when using a scanner that requires a JDBC driver to be provided (e.g. Hive, Teradata, MySQL) or specific 3rd party tools or utilities (see the next bullet point for more details)
-
On a machine that has specific tools/utilities used by IBM Automatic Data Lineage extractors - this option is useful for cases when extracting metadata and lineage from e.g. Informatica PowerCenter, SAP BO, Cognos where specific 3rd party libraries or tools need to be available to Automatic Data Lineage executable. Installing Manta Agent on the machine where the tooling is already installed may simplify management of the upgrades and lifecycle of these 3rd party tools.
-
One or multiple Agents per datacenter - this option may be especially useful in case of geographically distributed data centers that may suffer from the effects of network latency.
List of the technologies Agent currently extracts from.
-
Apache Hive
-
BigQuery
-
Cognos (as of R42.9)
-
Databricks
-
IBM Db2
-
Informatica PowerCenter
-
Matillion
-
Microstrategy
-
MS SQL
-
MySQL (as of R42.1)
-
Netezza
-
Oracle
-
PostgreSQL
-
PowerBI (as of R42.2)
-
QlikSense
-
SAP Hana
-
SAP BO
-
Snowflake
-
S3
-
SSIS
-
SSRS
-
StreamSets
-
Tableau (as of R42.1)
-
Teradata
To start using Agent, follow the steps in this guide.
Requirements
-
Memory: 1 GB
-
Disk space:
-
500 MB for Agent itself
-
20 GB for the extracted data (depends on the size of the extracted data)
-
Agent Registration
Step 1: Register Agent
-
Go to the Application Manager tab and find the List of Agents table.
- Here, you’ll see a list of all registered agents.
-
Click on + Register New Agent.
-
Fill in the required fields, then click Register.
- The name of the registered agent should be unique. It is possible to change it later. The name is used to select an agent for extraction.
Then, you’ll be redirected to the Generate Configuration for Agent Installer page.
Step 2: Generate a Configuration for Agent Installer
All communication between Artemis and its connected services (including Agent) is secured with mTLS, which is a type of authentication that requires a matching key pair. There are two ways to generate key pairs for mTLS.
-
Generate it via Automatic Data Lineage
-
Generate it manually and provide it
Generated by Automatic Data Lineage
-
Select Generated by Manta.
-
Click on Generate.
Automatic Data Lineage generates the key pair for Agent and a certificate signed by the Manta authority, which is stored in the Artemis truststore.
Generated Manually and Provided
-
Select Generate Manually and Provide.
-
Click on Edit.
-
Add your private keys.
-
Click on Generate.
You’ll need to manually add the generated certificate to the Artemis truststore, which is located at
mantaflow\artemis\manta_broker\etc\artemistruststore
. There are two ways to add the certificate.
-
Using Keytool, Java’s embedded console tool
-
Using Key Store Explorer (if you aren’t used to working with the console)
Step 3: Download the Generated Agent Configuration
After you click Generate, the Agent configuration will be generated. When it’s ready, you’ll receive a notification. Click on the link to download your configuration.
The download will contain a ZIP archive, which contains all the configurations you’ll need for Agent. When you install Agent, you’ll need to provide this configuration to the Agent installer.
Agent Management
Once a new agent is registered, it will appear in the List of Agents. Right after registration, the agent’s status will be Was Not Connected Before because it hasn’t been installed and it isn’t running yet.
There is a list of actions that can be performed on a registered agent. To see those actions, click on the three dots at the end of the row with the agent.
The following actions may be performed on an agent.
-
Edit — edit the agent configuration
-
Generate Installation Configuration — generate a new Agent installer configuration
-
Unregister — unregister the agent; causes the agent to shutdown
-
It is impossible to use an unregistered agent for extraction, and an unregistered agent cannot be registered again
-
Therefore, if an agent has been unregistered, it cannot be used again
-
If an agent is needed after unregistration, a new agent should be installed
-
Agent Installation
Install Agent as an application on a VM (Linux or Windows). For example, you could have a separate VM just for Agent, install Agent on the machine where the application that you need to connect to is running, or deploy Agent as a Docker image.
Deploy Agent as a Docker Image
-
Generate and download the Agent configuration in
config.zip
as described above. -
Create the Manta Flow Agent Conf directory:
<Agent_Path>/manta-flow-agent-dir/conf
. (It must be the same as in step four.) -
Extract
config.zip
and copy theartemiskeystore
andartemistruststore
files to add them to the Manta Flow Agent Conf directory created in step two. -
Run Agent after modifying the parameter in
<>
. Get the data fromagent_installer_config.xml
fromconfig.zip
, as shown below.export $USER_PARAMS="--user $(id -u):$(id -g)" docker run -d $USER_PARAMS \ -v <Agent_Path>/manta-flow-agent-dir:/opt/mantaflow/agent/manta-flow-agent-dir \ -e MANTA_ARTEMIS_HOST=<Public IP> \ -e MANTA_ARTEMIS_PORT=61616 \ -e MANTA_ARTEMIS_MTLS_ENABLED=true \ -e MANTA_ARTEMIS_KEYSTORE_PATH='/opt/mantaflow/agent/manta-flow-agent-dir/conf/artemiskeystore' \ -e MANTA_ARTEMIS_KEYSTORE_PASSWORD='<KeyStore Password>' \ -e MANTA_ARTEMIS_TRUSTSTORE_PATH='/opt/mantaflow/agent/manta-flow-agent-dir/conf/artemistruststore' \ -e MANTA_ARTEMIS_TRUSTSTORE_PASSWORD='<TrustStore Password>' \ -e MANTA_AGENT_COMMON_ID='<Agent ID>' \ -p 8787:8787 \ repo.getmanta.com/manta-ubi8/manta-flow-agent:41.1.6
Install Agent as an Application
Download the Agent installer and launch it. The Agent installer will guide you through the installation process.
First, you’ll need to provide a configuration for the Agent installer.
In the next step, the configuration will be imported from the Agent installer configuration, the ZIP will be shown, and it will be possible to edit the configuration, if necessary. Usually, there is no need to edit it, but in more complicated network configurations, it may be required.
Then, the configuration will be tested. If there are any problems, an error message will appear and it will be possible to edit the configuration and try again.
When Agent installation is finished and Agent is launched, the List of Agents will show the version and current status of the agent.
Agent Logging Configuration
In Agent there is a custom logic that controls how many log files are created. By default, the value is 30 files, for each of the support Agent use-cases:
- EXTRACTION
- VALIDATION
- PERPETUAL_PROCESS (for open lineage)
Each use-case then produces 3 separate files, following this pattern:
manta-flow-agent-USE-CASE-logId.log
manta-flow-agent-USE-CASE-logId_stdout.log
manta-flow-agent-USE-CASE-logId_stderr.log
Each of the file counts against the limit. For example 1 extraction run will use up 3 slots of the default 30. Once the limit is exhausted, the oldest created file is deleted.
The value is configurable via Agent configuration file located in <Agent_Path>/manta-flow-agent-dir/conf/application-user.yml
The configuration file is YAML format where the white space is significant.
manta:
agent:
extractor-logging-file-count: 30
validator-logging-file-count: 30
perpetual-process-logging-file-count: 30
Selecting an Agent for Extraction
After registering and installing the agent, select the agent to be used for the extraction. All technologies that support extraction with the agent have new properties in the Advanced section of the connection configuration. Two new properties have been added, namely the Extraction Method and Extraction Agents.
Select the Agent extraction method, and then, select the agents that should be used for the extraction. It is possible to select multiple agents for the extraction, but during the extraction, only one agent will be used. Before the extraction, the agent that should perform the extraction will be selected from the list of agents based on the agent status. If there are several agents and only a few of them are online, then the first agent that is online will be selected.
Running the Extraction
After completing all the previous steps, you can run the extraction with Agent using Process Manager.
mantaflow\cli\scenarios\manta-dataflow-cli\conf
is mapped to the Agent directory <AGENT_ROOT>\manta-flow-agent-dir\conf
.
External Libraries
If you’re using a technology that requires libraries which aren’t part of the Automatic Data Lineage distribution, you need to put them in the
<AGENT_ROOT>/manta-flow-agent-dir/lib-ext
folder. You should only add the libraries listed in this section to this folder. If there are any issues with any of the libraries, you’ll receive an error message telling you which
library contains the problem. Samples of scanners requiring additional libraries:
- Apache Hive - see
Hive
Scanner Guide for more details.
Download the attached ZIP file,hive_libraries.zip
, and copy the libraries contained in the ZIP file into the folder.
-
Cognos - see Cognos Integration Requirements for more details
-
Databricks - for extraction from Hive Metastore, see Databricks Integration Requirements for more details
-
Informatica PowerCenter - see Informatica PowerCenter Scanner Guide for more details
-
MySQL, MariaDB and SingleStore - see MySQL Integration Requirements for more details
-
SAP BO - see SAP BO Integration Requirements for more details
-
SAP HANA - see SAP HANA Integration Requirements for more details
-
Teradata - see Teradata Integration Requirements for more details
Ingest Source Support
There are two types of supported sources for file ingestion: agent and Git repository. Both are used for the same purpose of ingesting your own files, but when using Agent you must put the files you want to ingest into the agent’s directory in the file system. When using the a Git repository you must define a Git repository connection where the agent will find your files to be ingested.
Agent Source
Ingest can be used for all technologies in the Ingest section of your settings when you’re in Advanced Mode. Each ingest property has a corresponding standard property. The ingest property consists of a list of inputs, where you specify the type of source (agent or Git), and the source and target field.
Property | Description | Sample value | Fully resolved path |
---|---|---|---|
Source | Path relative to <AGENT_ROOT>/data/ingest where the input files will be taken from. There is no way for this path to point outside of the agent's ingest directory |
myerwin |
<AGENT_ROOT>/data/ingest/myerwin |
Target | Path relative to the connection's input directory (e.g. for erwin scanner it would be ${manta.dir.input}/erwin/${erwin.system.id} ) where the input files will be placed to. There is no way for this path to point outside of
the connections input directory. |
scripts |
${manta.input.dir}/erwin/{erwin.system.id}/scripts |
The following image shows the Erwin connection. After the files have been ingested into the input directory they will be used as part of the next phase of the workflow in the same way as the extracted data during the Extraction phase is ready for analysis.
When connecting, you can only use one agent for ingest, but you are allowed to use as many source and target folders from the same agent as you want. The following is allowed and it should be able to run since only the default agent has been used two times.
The following example is not allowed because two different agents are used in the same connection. It will show an error when running the extraction.
The ingest scenario is built into every default workflow. You can use the workflow editor to add the ingest scenario to custom workflows. It is also possible to run the ingest scenario on its own without the workflow.
Git Source
Automatic Data Lineage supports Git Ingest connections from version 42.4, for the download of files from a Git repository to the workflow.
First, you need to create a Git Ingest connection where you will provide the configuration for the Git repository. Inside the Admin UI go to Connections → + Add Connection → Ingest → Git and fill the connection fields with the repository URL, branch and credentials. You can also choose the agent to use for downloading the files.
After creating your Git Ingest connection you are ready to use it inside any scanner connection that supports Ingest. In the following example we use an Erwin scanner connection to show how to use your Git Ingest connection to ingest files during the extraction phase. In the Advanced Configuration section of the scanner connection you will find the Ingest input field where you can select the Git Ingest connection.
Property | Description | Sample value | Fully resolved path |
---|---|---|---|
Source | Path relative to <GIT_REPOSITORY_ROOT> where the input files will be taken from. You can leave the Source field empty to download the entire Git repository into the target folder. |
/MyScripts/folder |
<GIT_REPOSITORY_ROOT>/MyScripts/folder |
Target | Path relative to the connection's input directory (e.g. for erwin scanner it would be ${manta.dir.input}/erwin/${erwin.system.id} ) where the input files will be placed to. There is no way for this path to point outside of
the connections input directory. |
scripts1 |
${manta.dir.input}/erwin/${erwin.system.id}/scripts1 |
You can use as many Git Ingest connections as you want
The ingest scenario is built into every default workflow. You can use the workflow editor to add the ingest scenario to custom workflows. It is also possible to run the ingest scenario on its own without the workflow.