How to Import Metadata at the Process Layer (Capturing Process Lineage Using IBM Manta Data Lineage)

This guide outlines how to import metadata at a custom “process” layer to capture process lineage (via Manta Data Lineage Custom Metadata Import).

Manta Data Lineage default paradigm for lineage cultivation is known as physical (Manta Data Lineage captures technical physical metadata) and maps out how data flows—from source to target—all the way down to the attribute/column level. On the other hand, process lineage represents elements (code, scripts, jobs, etc.) and links them together in the proper order of execution (showing the ordered sequence of operations).

Please review this guide, as it explains all the current conventions and the structure of custom metadata files.

edge.csv is where you’ll specify the sequence of execution (i.e., which scripts call other scripts, which call other sets of subsequent scripts, etc.). Here is an example.

ID, Source Node ID, Target Node ID, Edge Type, Resource ID

EdgeID1,Script_1,Script_2,DIRECT,My Scripts
EdgeID2,Script_2,Script_3,DIRECT,My Scripts
EdgeID3,Script_2,Script_4,DIRECT,My Scripts

In this example:

Script 1, script 2, script 3, and script 4 are all individual nodes defined in node.csv.

Script 1 calls script 2 (represented by a corresponding edge, defined in edge.csv above).

Script 2 calls script 3 (represented by a corresponding edge, defined in edge.csv above).

Script 2 also calls script 4 (represented by a corresponding edge,

defined in edge.csv above).

layer.csv dictates the layer this metadata will be imported under. In this case, you would name it something like Process (as this is to capture process lineage). Here is an example.

ID, Layer Name, Layer Type

Process,Process,Process

As of Manta Data Lineage version 3.31.1, the only characters that can be used in layer names are letters (a-z, A-Z), numbers (0-9), dashes (-), underscores (_), and spaces.

The nodes you create in node.csv will need to have corresponding resource IDs in order to be featured at this newly defined Process layer. This is done via the last data point in node.csv called Resource ID (the ID of the resource that each node belongs to).

All of your Open Manta / custom metadata files need to be placed here: mantaflow\cli\input\import\IMPORTID.

Review this article on setting up a new Open Manta Extensions import named IMPORTID.

Running the Scan

Sequence of execution when running the scan:

To execute the scenario use Manta Orchestration API Scripts.

  1. Create a new minor revision starting with the newMinorRevisionScenario.

  2. Analyze the custom metadata input scripts using the importDataflowScenario.

  3. Commit the current revision with the commitRevisionScenario.