Configuring agents for lineage metadata import (IBM Manta Data Lineage)
Configure Manta agents in the same location or network segment as the external system to extract lineage metadata from these systems and visualize this data on a lineage graph.
Manta agents are available in Cloud Pak for Data 5.1.2 or later.
Overview
In most cases, you can access many data sources directly from Cloud Pak for Data. However, it is not always possible or optimal. You can then use Manta agents, which you install in the same location or network segment as the external system from which you want to extract metadata for lineage analysis. The most common use cases are:
- It is not possible to connect to an on-premises data source.
- You connect to a data source that requires specific third-party tools or libraries and you can't or don't want to install these tools or libraries on Cloud Pak for Data.
- Your data centers are distributed in many geographical locations and you want to avoid delays of data transfer (network latency).
The following list summarizes the steps that are required to import lineage metadata by using the Manta agents:
- Download the Manta agent executable files and save them in the target location. These files are compressed in a .zip file. Extract the file.
- Register a new agent instance in Manta Data Lineage, and save the configuration file.
- Copy the agent instance configuration file to the target location and start the agent.
- When you create a metadata import, select the agent from the list.
Each instance of a data source might require an individual agent instance, depending on the access settings. For example, if you have three instances of IBM Cognos Analytics, you might need to register three agent instances, and configure them independently on each Cognos Analytics instance. Provide meaningful names for the agent instances to know to which data source instance the agent is connected.
Supported data sources
You can use the agents with the following data source:
- IBM Cognos Analytics. When you create a metadata import, using agents is the only way to connect to Cognos Analytics. You select agents in the Connection mode option when you create a metadata import.
Agent status
The agent can have the following statuses:
- Online: The agent is configured and connected. It is ready to be used.
- Offline: The agent is configured but is not connected at the moment.
- Registered: The agent is registered but needs to be configured on the external system.
Prerequisites
Create new user accounts for each agent instance:
- On Cloud Pak for Data: A technical user that the agent on the external system uses to communicate with Cloud Pak for Data. The username and API key for this user account is provided during the agent registration.
- On the external system: An operating system user account that is used to run the agent. The agent executable files and agent configuration file are stored on this user account. Use Java Runtime Environment (JRE) version 21 or higher.
It is important to create dedicated user accounts for each agent instance for the following reasons:
- The agent uses API key to authenticate. Each time the API key changes, you must register the agent instance again to update the configuration file and replace the old configuration file with a new one on the external system. When you have a separate user account that is dedicated only for the agent, you are independent of other applications and scripts that use the same API key. In case you must regenerate the API key, you can safely do it without disrupting other processes. And if someone else regenerates the API key, the agent is not affected.
- Security of the data. The agent configuration file contains confidential information that includes a username and API key. This file must be always protected. On the external system only authorized users can access it. With a dedicated user account for running the agent on the external system, the confidential data is secure. Also, even when data is compromised, the impact is limited to one agent instance only.
For these reasons, create new user accounts for each agent instance, and do not reuse the same users for various agent instances.
Downloading Manta Agent executable files
Download the Manta Agent executable files from the Passport Advantage website. Extract the .zip file in a location where the executable
files are allowed. For example, it can be /usr/local/bin/manta-agent
on the Linux operating system, and C:/manta-agent
on the Windows operating system.
Registering an agent in Manta Data Lineage
To register a new agent, complete these steps in Manta Data Lineage:
- Go to Data > Data lineage and click the Data lineage setup link.
- On the Manage agents tab, click New agent.
- If you already have the Manta agent executable files on your external system, go to the next step. If not, download the .zip file and extract it on the external system.
- Define the following details:
- Name: The name for the agent instance, it cannot contain spaces. Provide a meaningful name that clearly identifies to which data source it is connected.
- Username: The name of the user for which the API key was generated. The best practice is to create new user and API key for each agent. For more information, see Prerequisites.
- API key: The API key that is associated with the username that you provided earlier.
- Click Register.
- Download the configuration file. You will use it to finish configuring the agent on the external system.
At this point, the agent status is Registered.
Configuring the agent on the external system
To finish the configuration of the agent on the external system, complete these steps:
- Copy the agent configuration file to the same location where you extracted the agent executable files.
- Run the starting script, which is
run.sh
orrun.bat
, depending on your operating system.
At this point, the agent status is Online. It is ready to be used in the metadata import. For more information, see Creating metadata imports.
When the agent is run for the first time, the data
folder is created in the location where you extracted the .zip file. The data
folder contains log files for the agent, where you can find the agent's status updates
and information about ongoing extraction jobs.
Updating API key
In some cases, you might need to update the API key for an agent. These cases are:
- The agent configuration file is lost. In this case, the API key must be regenerated and a new configuration file created.
- The API key was regenerated for your service or platform, and a new configuration file must be created with the updated API key.
It is a best practice to create a new user account with a unique API key for each agent instance. For more information, see Prerequisites.
To update the API key for an agent, complete these steps:
- Generate the new API key. For more information, see Generating API keys for authentication.
- Go to Data > Data lineage > Data lineage setup.
- On the Manage agents tab, find the agent that you want to update and click it to display the details panel.
- Click Update API key.
- Provide the new values for the username and API key.
- Download the new configuration file.
- On the external system, replace the old configuration file with the new one.
- Restart the agent by using the
shutdown.sh
orshutdown.bat
, andrun.sh
orrun.bat
scripts, depending on your operating system.
Removing an agent
To remove an agent, complete these steps, in any order:
- On the Manage agents tab in Cloud Pak for Data, find the agent, open the details panel, and click Delete agent.
- On the external system, stop the agent by using the
shutdown.sh
orshutdown.bat
script, and delete the files that you extracted from the .zip file and the configuration file for the agent.
Configuring agent memory for the extraction phase in lineage import
When an agent is used in lineage metadata import, some amount of memory is used in the lineage extraction phase of the import. By default, the memory that is used in the process is limited by the amount of memory that is available on the system. In most cases, you don't need to change the default value. However, for some technologies it might be better to modify the default value to adjust the memory consumption. If applicable, specific values are listed per technology type.
To modify the memory value, on the external system edit the agent configuration .json file. Add the following property with the memory value in megabytes:
"lineage.agent.extractor-memory": "1024"
The value 1024
is an example. The whole .json file structure is similar to the following one (the xxx
replace the real values):
"kg_cp4d_token_url": "xxx",
"kg_cp4d_api_key": "xxx",
"kg_cp4d_api_username": "xxx",
"lineage_agent_id": "xxx",
"lineage_agent_secret_id": "xxx",
"lineage_service_url": "xxx",
"jwt_token_type": "xxx"
"lineage.agent.extractor-memory": "1024"
Learn more
Parent topic: Data lineage in Manta Data Lineage