Manta Flow Agent Configuration for Extraction

Overview

Learn how to use an agent to extract metadata from the server. Manta Flow Agent is an on-premises agent that extracts source systems which aren’t available from outside the network and sends the extracts to Manta Flow Cloud for data lineage analysis.

There are several options where to deploy Manta Agent. The following list summarizes the most common options:

List of the technologies Agent currently extracts from.

To start using Agent, follow the steps in this guide.

Requirements

Agent Registration

Step 1: Register Agent

  1. Go to the Application Manager tab and find the List of Agents table.

    • Here, you’ll see a list of all registered agents.
  2. Click on + Register New Agent.

    No alt text provided

  3. Fill in the required fields, then click Register.

    • The name of the registered agent should be unique. It is possible to change it later. The name is used to select an agent for extraction.

No alt text provided

Then, you’ll be redirected to the Generate Configuration for Agent Installer page.

Step 2: Generate a Configuration for Agent Installer

All communication between Artemis and its connected services (including Agent) is secured with mTLS, which is a type of authentication that requires a matching key pair. There are two ways to generate key pairs for mTLS.

Generated by Automatic Data Lineage

  1. Select Generated by Manta.

  2. Click on Generate.

No alt text provided

Automatic Data Lineage generates the key pair for Agent and a certificate signed by the Manta authority, which is stored in the Artemis truststore.

Generated Manually and Provided

  1. Select Generate Manually and Provide.

  2. Click on Edit.

    No alt text provided

  3. Add your private keys.

  4. Click on Generate.

You’ll need to manually add the generated certificate to the Artemis truststore, which is located at mantaflow\artemis\manta_broker\etc\artemistruststore. There are two ways to add the certificate.

  1. Using Keytool, Java’s embedded console tool

  2. Using Key Store Explorer (if you aren’t used to working with the console)

Step 3: Download the Generated Agent Configuration

After you click Generate, the Agent configuration will be generated. When it’s ready, you’ll receive a notification. Click on the link to download your configuration.

The download will contain a ZIP archive, which contains all the configurations you’ll need for Agent. When you install Agent, you’ll need to provide this configuration to the Agent installer.

No alt text provided

Agent Management

Once a new agent is registered, it will appear in the List of Agents. Right after registration, the agent’s status will be Was Not Connected Before because it hasn’t been installed and it isn’t running yet.

There is a list of actions that can be performed on a registered agent. To see those actions, click on the three dots at the end of the row with the agent.

No alt text provided
No alt text provided

The following actions may be performed on an agent.

Agent Installation

Install Agent as an application on a VM (Linux or Windows). For example, you could have a separate VM just for Agent, install Agent on the machine where the application that you need to connect to is running, or deploy Agent as a Docker image.

Deploy Agent as a Docker Image

  1. Generate and download the Agent configuration in config.zip as described above.

  2. Create the Manta Flow Agent Conf directory: <Agent_Path>/manta-flow-agent-dir/conf. (It must be the same as in step four.)

  3. Extract config.zip and copy the artemiskeystore and artemistruststore files to add them to the Manta Flow Agent Conf directory created in step two.

  4. Run Agent after modifying the parameter in <>. Get the data from agent_installer_config.xml from config.zip, as shown below.

    export $USER_PARAMS="--user $(id -u):$(id -g)"
    docker run -d $USER_PARAMS \
        -v <Agent_Path>/manta-flow-agent-dir:/opt/mantaflow/agent/manta-flow-agent-dir \
        -e MANTA_ARTEMIS_HOST=<Public IP> \
        -e MANTA_ARTEMIS_PORT=61616 \
        -e MANTA_ARTEMIS_MTLS_ENABLED=true \
        -e MANTA_ARTEMIS_KEYSTORE_PATH='/opt/mantaflow/agent/manta-flow-agent-dir/conf/artemiskeystore' \
        -e MANTA_ARTEMIS_KEYSTORE_PASSWORD='<KeyStore Password>' \
        -e MANTA_ARTEMIS_TRUSTSTORE_PATH='/opt/mantaflow/agent/manta-flow-agent-dir/conf/artemistruststore' \
        -e MANTA_ARTEMIS_TRUSTSTORE_PASSWORD='<TrustStore Password>' \
        -e MANTA_AGENT_COMMON_ID='<Agent ID>' \
        -p 8787:8787 \
        repo.getmanta.com/manta-ubi8/manta-flow-agent:41.1.6
    

Install Agent as an Application

Download the Agent installer and launch it. The Agent installer will guide you through the installation process.

First, you’ll need to provide a configuration for the Agent installer.

No alt text provided

In the next step, the configuration will be imported from the Agent installer configuration, the ZIP will be shown, and it will be possible to edit the configuration, if necessary. Usually, there is no need to edit it, but in more complicated network configurations, it may be required.

No alt text provided

Then, the configuration will be tested. If there are any problems, an error message will appear and it will be possible to edit the configuration and try again.

No alt text provided

When Agent installation is finished and Agent is launched, the List of Agents will show the version and current status of the agent.

No alt text provided

Agent Logging Configuration

In Agent there is a custom logic that controls how many log files are created. By default, the value is 30 files, for each of the support Agent use-cases:

Each use-case then produces 3 separate files, following this pattern:

Each of the file counts against the limit. For example 1 extraction run will use up 3 slots of the default 30. Once the limit is exhausted, the oldest created file is deleted.

The value is configurable via Agent configuration file located in <Agent_Path>/manta-flow-agent-dir/conf/application-user.yml

The configuration file is YAML format where the white space is significant.

manta:
agent:
extractor-logging-file-count: 30
validator-logging-file-count: 30
perpetual-process-logging-file-count: 30

Selecting an Agent for Extraction

After registering and installing the agent, select the agent to be used for the extraction. All technologies that support extraction with the agent have new properties in the Advanced section of the connection configuration. Two new properties have been added, namely the Extraction Method and Extraction Agents.

No alt text provided

Select the Agent extraction method, and then, select the agents that should be used for the extraction. It is possible to select multiple agents for the extraction, but during the extraction, only one agent will be used. Before the extraction, the agent that should perform the extraction will be selected from the list of agents based on the agent status. If there are several agents and only a few of them are online, then the first agent that is online will be selected.

Running the Extraction

After completing all the previous steps, you can run the extraction with Agent using Process Manager.

Note: If an extraction uses a keystore/truststore, then for now, it should be copied-pasted manually to the server where the agent is running. In the future, the keystore/trustore will automatically be transferred to the agent before each extraction. The CLI directory mantaflow\cli\scenarios\manta-dataflow-cli\conf is mapped to the Agent directory <AGENT_ROOT>\manta-flow-agent-dir\conf.

External Libraries

If you’re using a technology that requires libraries which aren’t part of the Automatic Data Lineage distribution, you need to put them in the <AGENT_ROOT>/manta-flow-agent-dir/lib-ext folder. You should only add the libraries listed in this section to this folder. If there are any issues with any of the libraries, you’ll receive an error message telling you which library contains the problem. Samples of scanners requiring additional libraries:

hive_libraries.zip

Ingest Source Support

There are two types of supported sources for file ingestion: agent and Git repository. Both are used for the same purpose of ingesting your own files, but when using Agent you must put the files you want to ingest into the agent’s directory in the file system. When using the a Git repository you must define a Git repository connection where the agent will find your files to be ingested.

Agent Source

Ingest can be used for all technologies in the Ingest section of your settings when you’re in Advanced Mode. Each ingest property has a corresponding standard property. The ingest property consists of a list of inputs, where you specify the type of source (agent or Git), and the source and target field.

Property Description Sample value Fully resolved path
Source Path relative to <AGENT_ROOT>/data/ingest where the input files will be taken from. There is no way for this path to point outside of the agent's ingest directory myerwin <AGENT_ROOT>/data/ingest/myerwin
Target Path relative to the connection's input directory (e.g. for erwin scanner it would be ${manta.dir.input}/erwin/${erwin.system.id}) where the input files will be placed to. There is no way for this path to point outside of the connections input directory. scripts ${manta.input.dir}/erwin/{erwin.system.id}/scripts

The following image shows the Erwin connection. After the files have been ingested into the input directory they will be used as part of the next phase of the workflow in the same way as the extracted data during the Extraction phase is ready for analysis.

image-20240301-150059.png

When connecting, you can only use one agent for ingest, but you are allowed to use as many source and target folders from the same agent as you want. The following is allowed and it should be able to run since only the default agent has been used two times.

image-20240304-083205.png

The following example is not allowed because two different agents are used in the same connection. It will show an error when running the extraction.

image-20240304-083523.png

The ingest scenario is built into every default workflow. You can use the workflow editor to add the ingest scenario to custom workflows. It is also possible to run the ingest scenario on its own without the workflow.

Git Source

Automatic Data Lineage supports Git Ingest connections from version 42.4, for the download of files from a Git repository to the workflow.

First, you need to create a Git Ingest connection where you will provide the configuration for the Git repository. Inside the Admin UI go to Connections+ Add ConnectionIngestGit and fill the connection fields with the repository URL, branch and credentials. You can also choose the agent to use for downloading the files.

image-20240304-090613.pngimage-20240304-090519.png

After creating your Git Ingest connection you are ready to use it inside any scanner connection that supports Ingest. In the following example we use an Erwin scanner connection to show how to use your Git Ingest connection to ingest files during the extraction phase. In the Advanced Configuration section of the scanner connection you will find the Ingest input field where you can select the Git Ingest connection.

Property Description Sample value Fully resolved path
Source Path relative to <GIT_REPOSITORY_ROOT> where the input files will be taken from. You can leave the Source field empty to download the entire Git repository into the target folder. /MyScripts/folder <GIT_REPOSITORY_ROOT>/MyScripts/folder
Target Path relative to the connection's input directory (e.g. for erwin scanner it would be ${manta.dir.input}/erwin/${erwin.system.id}) where the input files will be placed to. There is no way for this path to point outside of the connections input directory. scripts1 ${manta.dir.input}/erwin/${erwin.system.id}/scripts1

image-20240304-094007.png

You can use as many Git Ingest connections as you want

image-20240304-093941.png

The ingest scenario is built into every default workflow. You can use the workflow editor to add the ingest scenario to custom workflows. It is also possible to run the ingest scenario on its own without the workflow.