Manta Flow Base Export User Documentation

This document describes the format of metadata exported from the Manta Flow application and how it can be used for further metadata analysis.

Export Execution

An export can be executed using the Export Repository Scenario in Process Manager and Orchestration API scripts. All data flows are also exported to CSV files stored in the <MANTA_DIR_HOME>/temp/processmanager/${workflowExecutionId}/csv folder and can be downloaded from Process Manager with the same structure as defined in Open Manta Extensions: Files and Formats and Open Manta Extensions: Usage.

Metadata Format

Exported metadata can be stored in CSV files or as tables in a database. This is its database schema.

DF_Layer — contains the names of metadata layers (i.e., physical, logical, business)
- ID — unique ID of the layer used as a foreign key for the resources
- Layer name — name of the layer
- Layer type — type of layer
DF_Resource — names and descriptions of resources (i.e., Teradata, Teradata DDL)
- Resource_Id — the unique ID of the resource used as a foreign key for nodes and edges
- Resource_Name — the name of the resource
- Resource_Type — the type of the resource
- Resource_Description — a description of the resource
- Layer ID — ID of the layer which this resource belongs to
DF_Node — nodes from a dataflow graph (e.g., tables, columns, packages, statements, etc.)
- Node_Id — the unique ID of the node used as a foreign key for edges and node attributes
- Parent_Node_Id — the ID of the parent node (e.g., columns have tables, packages have schemas) or an empty string if the node is at the highest level of the hierarchy (e.g., databases, folders)
- Node_Name — the name of the node
- Node_Type — the type of the node (all types can be found here)
- Resource_Id — the ID of the resource the node belongs to
DF_Edge — edges from a dataflow graph on the lowest possible level (attributes, columns, parameters, etc.)
- Edge_Id — the unique ID of the edge used as a foreign key for edge attributes
- Source_Node_Id — the ID of the source node
- Target_Node_Id — the ID of the target node
- Edge_Type — the type of the edge
  - DIRECT — a direct data flow (e.g., insert into target (direct) select direct from source;)
  - FILTER — a filter data flow (e.g., insert into target select * from source where filter = 0;)
  - MAPS_TO — maps nodes from different layers (e.g., maps First Name attribute [business/logical layer] to NAME_FIRST column [technical layer])
- Resource_Id — the ID of the resource the edge belongs to
DF_Node_Attribute — further attributes of nodes
- Node_Id — the ID of the node that the attribute belongs to
- Attribute_Name — the attribute name
- Attribute_Value — the attribute value
DF_Edge_Attribute — further attributes of edges
- Edge_Id — the ID of the edge the attribute belongs to
- Attribute_Name — the attribute name
- Attribute_Value — the attribute value
DF_Source_Code — source code associated with script nodes; this file is generated only when manta.exporter.includeSource is set to true
- Node_Id — the ID of the node the source code belongs to
- Script_Code — the source code text

All objects are exported into a single CSV file located in the mantaflow\cli\temp\csv directory. This single CSV file contains a varying number of columns based on the first object type column. Based on the object type column, the single output file can be split into seven original files. The previous version of the export created (7) CSV files which housed each of the different object types.

DF_Export — source code associated with script nodes; this file is generated only when manta.exporter.includeSourceis set to true
- Object_Type — type of the records; see the following table
- Object Fields ... — columns reflect the respective file type

The following table lists the translations from file to object type.

Object type	File type
layer	DF_Layer
resource	DF_Resource
node	DF_Node
edge	DF_Edge
node_attribute	DF_Node_Attribute
edge_attribute	DF_Edge_Attribute
source_code	DF_Source_Code

Each object type has N number of attributes associated with it, which correlate to the columns described in their corresponding file schemas. For example, the layer object type includes attributes (columns) that are described in the preceding DF_Layer CSV file.

ID — unique ID of the layer used as a foreign key for the resources
Layer name — name of the layer
Layer type — type of layer

Metadata Analysis

This section contains some queries that can be useful for various metadata analyses. Note that all queries are written in Teradata SQL dialect. Also note that all of these queries are provided as examples of how the exports can be queried.

Finding a Fully-Qualified Object

When it is necessary to find an object defined by a fully-qualified name, self-join the DF_Node table with the Parent_Node_Id property. The object resource name is also necessary. For example, it is possible to find the column ExampleDB.ExampleTable.ExampleColumn in a Teradata resource using this query.

select Col.*
from DF_Node Col
  inner join DF_Node Tab on Col.Parent_Node_Id = Tab.Node_Id
  inner join DF_Node Dat on Tab.Parent_Node_Id = Dat.Node_Id
  inner join DF_Resource Res on Dat.Resource_Id = Res.Resource_Id
where
  Res.Resource_Name = 'Teradata' and
  Dat.Node_Name = 'ExampleDB' and
  Tab.Node_Name = 'ExampleTable' and
  Col.Node_Name = 'ExampleColumn'
;

Here is the result.

3      2      ExampleColumn Column 1

Listing All Object Attributes

When the object ID is known (e.g., from a query for finding a fully-qualified object), it is easy to list all its attributes using this query.

select Atr.Attribute_Name, Atr.Attribute_Value
from DF_Node_Attribute Atr
where Atr.Node_Id = 3
;

Here is the result.

ORDER               1
COLUMN_CHARSET      UNICODE
COLUMN_LENGTH       50
COLUMN_TYPE         VARCHAR

Finding All Direct Flows from an Object (One Step)

When the object ID is known (e.g., from a query for finding a fully-qualified object), it is possible to find all objects that a direct flow leads to (from the identified object) in one step. Note that Automatic Data Lineage exports edges on the attribute level, meaning that flow exists between attribute-level nodes—not between any other objects (such as tables).

select Tgt.*
from DF_Edge Edg
inner join DF_Node Src on Src.Node_Id = Edg.Source_Node_Id
inner join DF_Node Tgt on Tgt.Node_Id = Edg.Target_Node_Id
where
Edg.EdgeType = 'DIRECT' and
Src.Node_Id = 3
;

Here is the result.

6      5      1 ExampleColumn      BTEQ ColumnFlow     2
9      8      1 ExampleColumn      BTEQ ColumnFlow     2

Finding All Direct Flows from an Object (Recursive)

If it is necessary to find all direct flows from an object recursively, a recursive statement like this can be created.

with recursive Temp_Node(Node_Id, Depth) as (
  select Col.Node_Id, cast(0 as integer) as Depth
  from DF_Node Col
    inner join DF_Node Tab on Col.Parent_Node_Id = Tab.Node_Id
    inner join DF_Node Dat on Tab.Parent_Node_Id = Dat.Node_Id
    inner join DF_Resource Res on Dat.Resource_Id = Res.Resource_Id
  where
    Res.Resource_Name = 'Teradata' and
    Dat.Node_Name = 'SourceDB' and
    Tab.Node_Name = 'SourceTable' and
    Col.Node_Name = 'SourceColumn' 
union all
  select Tgt.Node_Id, Src.Depth + 1
  from Temp_Node Src
    inner join DF_Edge Edg on Edg.Source_Node_Id = Src.Node_Id
    inner join DF_Node Tgt on Edg.Target_Node_Id = Tgt.Node_Id
  where
    Edg.Edge_Type = 'DIRECT' and
    Src.Depth <= 10 
)
select distinct Node_Id
from Temp_Node;
;

However, this statement must be limited to a maximum number of steps (depth). Note that even this limitation will not necessarily prevent the statement from taking a very long time. Therefore, it is better to write the same function using a stored procedure (which also must be limited) or a recursive BTEQ script.