How to Import Metadata at the Process Layer (Capturing Process Lineage Using IBM Manta Data Lineage)
This guide outlines how to import metadata at a custom “process” layer to capture process lineage (via Manta Data Lineage Custom Metadata Import).
Manta Data Lineage default paradigm for lineage cultivation is known as physical (Manta Data Lineage captures technical physical metadata) and maps out how data flows—from source to target—all the way down to the attribute/column level. On the other hand, process lineage represents elements (code, scripts, jobs, etc.) and links them together in the proper order of execution (showing the ordered sequence of operations).
Please review this guide, as it explains all the current conventions and the structure of custom metadata files.
edge.csv
is where you’ll specify the sequence of execution (i.e., which scripts call other scripts, which call other sets of subsequent scripts, etc.). Here is an example.
ID, Source Node ID, Target Node ID, Edge Type, Resource ID
EdgeID1,Script_1,Script_2,DIRECT,My Scripts
EdgeID2,Script_2,Script_3,DIRECT,My Scripts
EdgeID3,Script_2,Script_4,DIRECT,My Scripts
In this example:
Script 1, script 2, script 3, and script 4 are all individual nodes defined in node.csv
.
Script 1 calls script 2 (represented by a corresponding edge, defined in
edge.csv
above).
Script 2 calls script 3 (represented by a corresponding edge, defined in
edge.csv
above).
Script 2 also calls script 4 (represented by a corresponding edge,
defined in edge.csv
above).
layer.csv
dictates the layer this metadata will be imported under. In this case, you would name it something like Process (as this is to capture process lineage). Here is an example.
ID, Layer Name, Layer Type
Process,Process,Process
As of Manta Data Lineage version 3.31.1, the only characters that can be used in layer names are letters (a-z, A-Z), numbers (0-9), dashes (-), underscores (_), and spaces.
The nodes you create in node.csv
will need to have corresponding resource IDs in order to be featured at this newly defined Process layer. This is done via the last data point in node.csv
called
Resource ID (the ID of the resource that each node belongs to).
All of your Open Manta / custom metadata files need to be placed
here: mantaflow\cli\input\import\IMPORTID
.
Review this article on setting up a new Open Manta Extensions import named IMPORTID.
Running the Scan
Sequence of execution when running the scan:
To execute the scenario use Manta Orchestration API Scripts.
-
Create a new minor revision starting with the
newMinorRevisionScenario
. -
Analyze the custom metadata input scripts using the
importDataflowScenario
. -
Commit the current revision with the
commitRevisionScenario
.