Open Manta Extensions: Files and Formats
The following is the structure of the CSV files (layer.csv,resource.csv, node.csv, edge.csv, node_attribute.csv, edge_attribute.csv, source_code.csv) that
can be used both for import to and export from IBM Automatic Data Lineage. Commas (,) are used as delimiters, double quotes (") are used as quotes, and backslashes (\) are used as escape characters (they
are used to escape double quotes or backslashes in the data, e.g., "key: \"value\"","c:\\").
Detailed Description of the Individual File Structure
-
layer.csv — contains the names of metadata layers (e.g., physical, logical, business)
-
Id — unique ID of the layer, used as a foreign key for the resources; no specific format is required, any characters can be used
-
Layer name — name of the layer; the only characters that can be used in layer names are letters (a-z, A-Z), numbers (0-9), dashes (-), underscores (_), and spaces
-
Layer type — type of layer
-
-
resource.csv — contains the names and descriptions of resources / source technologies (e.g., Teradata, Teradata DDL)
-
Id — unique ID of the resource, used as a foreign key for nodes and edges; no specific format is required, any characters can be used
-
Resource name — name of the resource
-
Resource type — type of resource
-
Resource description — description of the resource
-
Layer ID — ID of the layer (reference to one of the IDs in
layer.csv) that the resource belongs to
-
-
node.csv — contains nodes from a dataflow graph (e.g., tables, columns, packages, statements)
-
Id — unique ID of the node, used as a foreign key for edges and node attributes; no specific format is required, any characters can be used
-
Parent id — ID of the parent node (e.g., a column has a table, a package has a schema) or an empty string if the node is at the top of the hierarchy (e.g., databases, folders)
-
Node name — name of the node
-
Node type — type of node (e.g., table, column, workflow, job, report)
-
Resource id — ID of the resource (reference to one of the IDs in
resource.csv) that the node belongs to
-
-
edge.csv — contains edges from a dataflow graph
-
Id — unique ID of the edge, used as a foreign key for edge attributes; no specific format is required, any characters can be used
-
Source node id — ID of the source node (reference to one of the IDs in
node.csv); this can also be a full path to an existing object in the repository which does not have to be specified innode.csv -
Target node id — ID of the target node (reference to one of the IDs in
node.csv); this can also be a full path to an existing object in the repository which does not have to be specified innode.csv -
Edge type — type of edge
-
DIRECT — direct data flow (e.g., insert into target (direct) select direct from source;)
-
FILTER — filter data flow (e.g., insert into target select * from source where filter = 0;)
-
MAPS_TO — maps nodes from different layers (e.g., maps the
First Nameattribute to theNAME_FIRSTcolumn) -
PERSPECTIVE — maps nodes from the physical layer to nodes from perspective layers
-
-
Resource id — ID of the resource (reference to one of the IDs in
resource.csv) that the edge belongs to
-
-
node_attribute.csv — contains further attributes of nodes
-
Node id — ID of the node (reference to one of the IDs in
node.csv) that the attribute belongs to; this can also be a full path to an existing object in the repository which does not have to be specified innode.csv -
Attribute name — attribute name
-
Attribute value — attribute value
-
-
edge_attribute.csv — contains further attributes of edges
-
Edge id — ID of the edge (reference to one of the IDs in
edge.csv) that the attribute belongs to -
Attribute name — attribute name
-
Attribute value — attribute value
-
-
source_code.csv — contains source code of nodes; can be turned on/off via client configuration (see Common Resource Configuration)
-
Node id — ID of the node that the source code belongs to
-
Source code — source code of the node
-
Additional Considerations for Import
-
The purpose of the ID is only to connect elements to the edges in one load; therefore, you can choose your own assignment system. IDs can contain any sequence of characters.
-
Evaluation of
Node idinedge.csvandnode_attribute.csvhas been extended: now, node ID can be a fully qualified path to an existing object in the Manta Flow repository (even in an open/uncommitted revision). When the node_id is not found innode.csv, Automatic Data Lineage attempts to interpret theNode idas a path to an existing node in the Automatic Data Lineage repository. -
A fully qualified path has the following format:
-
/DB2/server-name/dwh/customers/action/name -
If multiple nodes with the same path exist, the node types might be provided in squared brackets; for example,
/DB2/server-name[Server]/dwh/customers/action[Table]/name. -
If the qualified name contains any restricted characters, they should be escaped with
\\; for example, a resource having the nameDB2 PL/SQLwould have the qualified name entry/DB2 PL\\/SQL/dwh-connection/customers. -
If the qualified name contains any backslash, it should be escaped as
\\; for example, a resource having the nameDFWMCPRE0056\LOANPRDPwould have the qualified name entry/DFWMCPRE0056\\LOANPRD/dwh-connection/customers.
-
-
The parent definition must be provided before the child definition in the same file (e.g., define the table prior to using it as a parent node for defining a table column).
-
The file
source_code.csvcannot be used for import. -
If the input file contains a header on the first line, the property
Imported CSV files contains headersin the Admin GUI > Configuration > CLI > Common > Common Config > Import Settings must be set toautodetect(default value as of R42.1) ortrue. -
The node ID extension is evaluated as a full path to an existing object in the Manta Flow Server repository, therefore all files are optiona; for example, the user can chose to only provide
node_attribute.csv. -
To avoid missing elements for a component, define an object before referring it to the parent ID.
Additional Considerations for Export
-
The purpose of the ID is only to connect elements to the edges in a single set of files for metadata export. The IDs may change between the exports, so you should never rely on those remaining the same over time.
-
The output files may or may not contain headers depending on the
Should export files contain headersflag in the Admin GUI > Configuration > CLI > Common > Common Config > Export Settings.