Manta Flow Base Export User Documentation
This document describes the format of metadata exported from the Manta Flow application and how it can be used for further metadata analysis.
Export Execution
An export can be executed using the Export Repository Scenario in Process Manager and Orchestration API scripts. All data flows are also exported to CSV files stored in the
<MANTA_DIR_HOME>/temp/processmanager/${workflowExecutionId}/csv
folder and can be downloaded from Process Manager with the same structure as defined in Open Manta Extensions: Files and
Formats and Open Manta Extensions: Usage.
Metadata Format
Exported metadata can be stored in CSV files or as tables in a database. This is its database schema.
-
DF_Layer — contains the names of metadata layers (i.e., physical, logical, business)
-
ID — unique ID of the layer used as a foreign key for the resources
-
Layer name — name of the layer
-
Layer type — type of layer
-
-
DF_Resource — names and descriptions of resources (i.e., Teradata, Teradata DDL)
-
Resource_Id — the unique ID of the resource used as a foreign key for nodes and edges
-
Resource_Name — the name of the resource
-
Resource_Type — the type of the resource
-
Resource_Description — a description of the resource
-
Layer ID — ID of the layer which this resource belongs to
-
-
DF_Node — nodes from a dataflow graph (e.g., tables, columns, packages, statements, etc.)
-
Node_Id — the unique ID of the node used as a foreign key for edges and node attributes
-
Parent_Node_Id — the ID of the parent node (e.g., columns have tables, packages have schemas) or an empty string if the node is at the highest level of the hierarchy (e.g., databases, folders)
-
Node_Name — the name of the node
-
Node_Type — the type of the node (all types can be found here)
-
Resource_Id — the ID of the resource the node belongs to
-
-
DF_Edge — edges from a dataflow graph on the lowest possible level (attributes, columns, parameters, etc.)
-
Edge_Id — the unique ID of the edge used as a foreign key for edge attributes
-
Source_Node_Id — the ID of the source node
-
Target_Node_Id — the ID of the target node
-
Edge_Type — the type of the edge
-
DIRECT — a direct data flow (e.g.,
insert into target (direct) select direct from source;
) -
FILTER — a filter data flow (e.g.,
insert into target select * from source where filter = 0;
) -
MAPS_TO — maps nodes from different layers (e.g., maps First Name attribute [business/logical layer] to NAME_FIRST column [technical layer])
-
-
Resource_Id — the ID of the resource the edge belongs to
-
-
DF_Node_Attribute — further attributes of nodes
-
Node_Id — the ID of the node that the attribute belongs to
-
Attribute_Name — the attribute name
-
Attribute_Value — the attribute value
-
-
DF_Edge_Attribute — further attributes of edges
-
Edge_Id — the ID of the edge the attribute belongs to
-
Attribute_Name — the attribute name
-
Attribute_Value — the attribute value
-
-
DF_Source_Code — source code associated with script nodes; this file is generated only when
manta.exporter.includeSource
is set totrue
-
Node_Id — the ID of the node the source code belongs to
-
Script_Code — the source code text
-
All objects are exported into a single CSV file located in the
mantaflow\cli\temp\csv
directory. This single CSV file contains a varying number of columns based on the first object type column. Based on the object type column, the single output file can be split into seven original files. The
previous version of the export created (7) CSV files which housed each of the different object types.
-
DF_Export — source code associated with script nodes; this file is generated only when
manta.exporter.includeSource
is set totrue
-
Object_Type — type of the records; see the following table
-
Object Fields ... — columns reflect the respective file type
-
The following table lists the translations from file to object type.
Object type | File type |
---|---|
layer | DF_Layer |
resource | DF_Resource |
node | DF_Node |
edge | DF_Edge |
node_attribute | DF_Node_Attribute |
edge_attribute | DF_Edge_Attribute |
source_code | DF_Source_Code |
Each object type has N number of attributes associated with it, which correlate to the columns described in their corresponding file schemas. For example, the layer object type includes attributes (columns) that are described in the preceding DF_Layer
CSV file.
-
ID — unique ID of the layer used as a foreign key for the resources
-
Layer name — name of the layer
-
Layer type — type of layer
Metadata Analysis
This section contains some queries that can be useful for various metadata analyses. Note that all queries are written in Teradata SQL dialect. Also note that all of these queries are provided as examples of how the exports can be queried.
Finding a Fully-Qualified Object
When it is necessary to find an object defined by a fully-qualified name, self-join the DF_Node table with the Parent_Node_Id property. The object resource name is also necessary. For example, it is possible to find the column ExampleDB.ExampleTable.ExampleColumn in a Teradata resource using this query.
select Col.*
from DF_Node Col
inner join DF_Node Tab on Col.Parent_Node_Id = Tab.Node_Id
inner join DF_Node Dat on Tab.Parent_Node_Id = Dat.Node_Id
inner join DF_Resource Res on Dat.Resource_Id = Res.Resource_Id
where
Res.Resource_Name = 'Teradata' and
Dat.Node_Name = 'ExampleDB' and
Tab.Node_Name = 'ExampleTable' and
Col.Node_Name = 'ExampleColumn'
;
Here is the result.
3 2 ExampleColumn Column 1
Listing All Object Attributes
When the object ID is known (e.g., from a query for finding a fully-qualified object), it is easy to list all its attributes using this query.
select Atr.Attribute_Name, Atr.Attribute_Value
from DF_Node_Attribute Atr
where Atr.Node_Id = 3
;
Here is the result.
ORDER 1
COLUMN_CHARSET UNICODE
COLUMN_LENGTH 50
COLUMN_TYPE VARCHAR
Finding All Direct Flows from an Object (One Step)
When the object ID is known (e.g., from a query for finding a fully-qualified object), it is possible to find all objects that a direct flow leads to (from the identified object) in one step. Note that Automatic Data Lineage exports edges on the attribute level, meaning that flow exists between attribute-level nodes—not between any other objects (such as tables).
select Tgt.*
from DF_Edge Edg
inner join DF_Node Src on Src.Node_Id = Edg.Source_Node_Id
inner join DF_Node Tgt on Tgt.Node_Id = Edg.Target_Node_Id
where
Edg.EdgeType = 'DIRECT' and
Src.Node_Id = 3
;
Here is the result.
6 5 1 ExampleColumn BTEQ ColumnFlow 2
9 8 1 ExampleColumn BTEQ ColumnFlow 2
Finding All Direct Flows from an Object (Recursive)
If it is necessary to find all direct flows from an object recursively, a recursive statement like this can be created.
with recursive Temp_Node(Node_Id, Depth) as (
select Col.Node_Id, cast(0 as integer) as Depth
from DF_Node Col
inner join DF_Node Tab on Col.Parent_Node_Id = Tab.Node_Id
inner join DF_Node Dat on Tab.Parent_Node_Id = Dat.Node_Id
inner join DF_Resource Res on Dat.Resource_Id = Res.Resource_Id
where
Res.Resource_Name = 'Teradata' and
Dat.Node_Name = 'SourceDB' and
Tab.Node_Name = 'SourceTable' and
Col.Node_Name = 'SourceColumn'
union all
select Tgt.Node_Id, Src.Depth + 1
from Temp_Node Src
inner join DF_Edge Edg on Edg.Source_Node_Id = Src.Node_Id
inner join DF_Node Tgt on Edg.Target_Node_Id = Tgt.Node_Id
where
Edg.Edge_Type = 'DIRECT' and
Src.Depth <= 10
)
select distinct Node_Id
from Temp_Node;
;
However, this statement must be limited to a maximum number of steps (depth). Note that even this limitation will not necessarily prevent the statement from taking a very long time. Therefore, it is better to write the same function using a stored procedure (which also must be limited) or a recursive BTEQ script.