Dataflow Analysis

Once you have selected the elements, tuned the visualization parameters, and clicked on the Visualize button, or just clicked on a link in your documentation, the initial dataflow graph is shown. In this section, our visual representation of a data flow will be introduced together with all the actions that can be performed on it such as browsing, filtering, and exporting.

Introducing Data Flows

If you select your favorite table or file that is filled from a source system and then transformed into a target system and set a high level of detail in both directions, a visualization of indirect flows, no filter, and a steps displayed parameter of three, you can get a dataflow graph similar to the following one.

Data flow graph with many connected elements

You can see several elements, some of which are connected by arrows, and make the following observations.

Note: This is a default configuration that can be changed by your application administrator.

Starting from version R42.4, a more informative column description is displayed instead of Column names (which are in most cases autogenerated as numerical ordinal value) for Files under Filesystem assets. The column descriptions are also stored as attribute of the column and usually come from the tool that reads/writes the file (most often data integration tool). This substitution for a more user-friendly name is applied to the Object catalog, Lineage listing, and Data flow, as illustrated in the following screenshot. In cases when Manta Alias is specified for the column, it takes precedence and is displayed accordingly. However, it's important to note that these substituted names derived from the description attributes cannot be utilized as keywords for search purposes.

It is not applied to any exports. In the exports, the description columns are exported the same way as regular attributes.

Example of the export details

Browsing Data Flows

Once you know what all the objects on a dataflow graph represent, you can try interacting with them.

Moving and Zooming

When a dataflow graph is shown, it is possible that you will see only a few full elements and flows and that the others will be only partially visible. Now, you can do two things:

Selecting Elements and Flows

When you need to get more information about an element from a dataflow graph or when you need to see how an element is connected to others by direct data flows, click on it and you will see a change like in the following image.

Data flow graph with elements highlighted

First, you can see that the selected element turns yellow. Then, all the elements connected to this element by direct data flows (even transitively) change color together with all the affected arrows. The other arrows are now lighter so the colored ones stand out. This feature is really helpful when you select a whole table for dataflow analysis and then want to see column data flows separately. Blue represents data flow to the selected element, red represents data flow from the selected element, and violet represents data flow both to and from the selected element.

Moreover, the name of the element appears in the upper-left part of the screen under the search bar. You can click on the down arrow there to see other element attributes, as you can see in the following image. You can hide these attributes by clicking on the same icon, or you can select another element to see its attributes.

Additional element attributes are displayed on the graph

When you move the mouse to an arrow, it turns red to make it easily distinguishable from the others. When you click on it, all the elements and arrows on the dataflow paths going through this arrow turn red like when you select an element. You can see the selected arrow in the following image.

Arrow selected and an element highlighted in red in the graph

To deselect elements and/or arrows, just click on the white background.

Finding Elements

Elements hidden from view

Sometimes, especially on larger dataflow graphs, you might want to find a visible element. It is easy. Just type its name into the search bar in the upper-left part of the screen. A list of possible elements will appear from which you can choose the desired element by name, full path, and type. Click on the desired element so that it is centered on your screen and the element is highlighted. If it is not what you were searching for, you can click on another element. When you are satisfied, just click on the white background to cancel the highlighting and hide the list.

Expanding and Collapsing Elements

In this example, you can see all elements at their highest level of detail. However, this might not be necessary for some less interesting elements. In such a case, you can collapse an element containing other elements by clicking on its minus sign. The elements it contains will no longer be visible, as you can see in the following illustration.

Hiding elements with their subelements

On the other hand, if you start your initial data flow at a medium level of detail, all the elements will look like those collapsed in the previous figure. In this case, you can expand the elements to see their children by clicking their plus sign.

Discover Further Data Flow

When the steps displayed parameter is set lower than the maximum distance between the selected elements and some influenced elements, some elements will have flow icons on the right as you can see in the following image.

Element with flow icon on the right

That means that there are other elements that are transitively connected by data flow to the elements selected through the element with the icon. To see more elements on the path from the selected elements to the other connected elements through the element with the icon, just click on the icon. The icon will disappear and the new elements will appear like in the following illustration.

New elements shown after clicking the icon

It will look the same as if you had set the steps displayed parameter to four for only this particular path, thus the other paths with other elements and flows that are not important to your analysis are not shown. By clicking on these icons, you can follow only those paths that are important.

When no icon is present in an element, you can be sure that it is only connected to elements that are already visible. (We certainly only consider those connections that start at the elements which were originally selected.)

Restarting Data Flows

If you find an element that you want to do a dataflow analysis on, just right-click it and select Restart Visualization from the element options in the context menu. You will get an initial dataflow graph for this element with the same parameters as for the previous one.

Switching Data Flows to Another Layer

In the case of two or more layers, just one layer is active at a time and only the elements and flows belonging to the active layer are displayed. Typically, the active layer is selected on the object catalog page.

If an element maps or is mapped by another element in a different layer, it is possible to switch to that layer. Right-click the element and select Switch to ... Layer from the context menu, as shown in the following image.

Option Switch to Physical Layer shown

After that, the visualization will restart from the mapping or mapped element in the selected layer.

Center the Camera on the Start Node

It is possible to center the camera on the start node by simply clicking on the button in the lower-left part of the visualization.

Button used to center the camera on the start node

Filtering Data Flows

Manta Flow has two types of filters.

Both of them can be bound to the particular layer that the filter is valid for. If not, the filter is valid for all layers.

Both of them can be grouped into filter groups, and each group may contain filters of both types. The groups are configured by the administrators. The layer validity of the filter group is determined by the layer validity of the contained filters. The filter group is valid for a layer if and only if at least one filter in that group is valid for that layer.

You can select a filter group as a visualization parameter on the catalog screen by choosing the appropriate item from the Filters options. Only groups valid for the selected layer are displayed. If you select the /Oracle/orcl/infasuper schema and choose medium detail, both directions, no visualization of indirect flows, the DBs, Files & Reports filter, and a steps displayed parameter of one, you can get a dataflow graph similar to the following one.

Data flow graph example

Now you can see the direct data flow between the source elements and the selected elements. There are no elements from transformation technologies. The transformation elements have been replaced by wider arrows.

Note that the distance is counted without filtered elements. That is why the displayed source elements fulfill the maximal depth one condition, although some other elements exist between the source element and the one that has been selected.

There are several ways to modify the filtering on the visualization screen. You can display the filtered elements behind a particular arrow by right-clicking on that arrow and selecting the Show hidden lineage option. The arrow will be replaced by the hidden elements.

Data flow graph with filtered elements displayed

Other filtering options are available in the Options box in the upper-right part of the screen. Click on it to unpack it.

In both cases, only filters valid for the active layer are displayed.

To filter by the technology (resource) of a particular element, right-click on that element and select Filter ... Technology.

To filter only one particular element (and all its descendants) manually, right-click on that element and select Hide element.

To cancel manual filtering and hidden lineage (see the Show hidden lineage command mentioned earlier), unpack the Options box in the upper right, go to the Filters tab, and click the Clean Manual Visibility button.

Undo Your Last Action

Each action that somehow changes the visualized data flow (such as expand/discover/filter) is recorded and can be reversed. In the default settings, the application stores 10 recorded states that can be browsed in both directions (undo/redo). This can be done using the buttons with the left and right arrows located in the lower-left part of the screen (see the following illustration) or by using the well-known keyboard shortcuts Ctrl+Z for Undo and Ctrl+Y for Redo. Beware that when you go back a few actions using Undo and then change the data flow using Expand, all Redo states will be forgotten, as is standard undo/redo behavior in most applications.

Buttons to undo and redo actions

Viewing the Source Code

While browsing data flows, it can be interesting to see the original scripts that are represented by the flow. You can do this by choosing a node that represents a procedure or command inside a transformation element and selecting Show Context for This Element, as shown in the following illustration. You can also double-click on the transformation element.

Option Show Context for This Element displayed

This will open a pop-up window with the entire formatted script, and the statement represented by the clicked element will be highlighted, as shown in the following illustration.

Source script for the selected element displayed

To do a full-text search, use the shortcut Ctrl+F. Type the string you are searching for in the Search field and press Enter. The first occurrence of the string after the cursor will be focused, and all occurrences will be highlighted. Also, the appropriate parts of the vertical scrollbar will be highlighted, as shown in the following illustration.

Example of searching in the script

Details of Nodes and Edges

You can view the details of nodes and edges by simply selecting them. When a node or an edge is selected, it is possible to expand its detail by clicking on the expand arrow located next to the element name in the upper-left part of the screen just under the search box, as shown in the following illustration.

Expand button shown

The expanded dialog box contains detailed element information including its attributes. Strings that are too long have a link next to them that opens an additional dialog box showing the whole value. It is also possible to collapse the dialog box, returning it to its previous state. An example is shown in the following illustration.

Collapse button shown

You can find two special attributes in the edge detail.

Bulk Settings for Level of Detail

It is possible to change the level of detail for all elements of a specific resource. This is done under the Detail tab in the Options dialog box as shown in the following image.

Option tab in the details setting

The GUI settings are listed in a table. The resource list is on the left side and the level of detail is on the right side. In the case of two or more layers, only resources belonging to the active layer are visible.

There are four options for each resource.

Confirm the changes by clicking on Apply.

Graphically Comparing Two Revisions

Start the Revision Comparison

You can use Manta Visualization to view the differences between two selected revisions by doing the following.

  1. Select the main revision.

  2. Click on the Compare Revisions button.

    Compare Revisions button

  1. Select the older revision that it should be compared to.

    Two revision versions selected

  1. Select the starting element and visualization parameters as you would for a normal visualization.

  2. Display the visualization by clicking on the Visualize button.

Comparison Layout

The layout for revision comparison is based on the normal visualization but with several special effects. The highlighting is similar to the standard highlighting style for showing differences. This style is used by diff tools for version control system applications like GIT or SVN. That means that new objects will be green and the old ones will be red.

Comparison view with new elements highlighted in green and old elements highlighted in red

Detailed description:

  1. The merge statement in the procedure IMPORT_CRM was only in the older revision.

  2. The merge statement in the procedure IMPORT_LOAN was only in the newer revision.

  3. The merge insert was updated in the new revision by removing two columns — SRC_ID and SHORT_NAME.

  4. The rest of the columns are the same in both revisions.

  5. The header shows the dates of the revisions being compared. You can see the detail by hovering over them with the cursor.

It is possible to change the comparison color layout to the normal visualization in the color layer settings.

Colors tab in the details setting

Exporting Data Flows

When you find the information you are looking for, there are three different ways you can export it using the Export menu in the upper-left part of the screen.

Structure of Exported CSV Files

The first file, relations.csv, describes all data flows (one row for each data flow). Example:

Let’s imagine that we have a direct data transfer from column t1c2 of table1 to column t2c1 of table2. Both tables belong to a database called “db” under the database system Teradata.

Field Example value Description
TYPE DIRECT Either DIRECT or FILTER, depending on the type of data flow
SourcePath Teradata.db.table1.t1c2 The full address of the source
TargetPath Teradata.db.table2.t2c1 The full address of the target
SourceColumnName t1c2 The name of the source column
SourceColumnType Column
TargetColumnName t2c1 The name of the target column
TargetColumnType Column
SourceObjectName table1 The name of the source object (table)
SourceObjectType Table The object type is usually TABLE or VIEW
TargetObjectName table2 The name of the target object (table)
TargetObjectType Table The object type is usually TABLE or VIEW
SourceGroupName db The name of the source group (database)
SourceGroupType Database The group type is usually Database
TargetGroupName db The name of the target group (database)
TargetGroupType Database The group type is usually Database
SourceResourceName Teradata Resource name of the source (Teradata, Oracle, etc.)
SourceResourceType Teradata
TargetResourceName Teradata Resource name of the target (Teradata, Oracle, etc.)
TargetResourceType Teradata
RevisionState STABLE State of the object between two compared revisions (NEW, DELETED, STABLE, INNER)

The second file, vertices.csv, contains information about all the elements in the reference view (i.e., databases, tables, columns, etc.). If the element has any attributes, the file contains one row for each of its attributes (and they differ in their AttributeName and AttributeValue). If the element doesn’t have any attributes, the file contains one row about this element (in this case the AttributeName and AttributeValue remain empty).

Let’s imagine that the reference view contains the column Teradata.db.table1.t1c1. Then the file contains records about this column, about the table Teradata.db.table1, about the database Teradata.db, and about the resource Teradata.

Field Example — Column Example — Table Description
FullNodeName Teradata.db.table1.t1c1 Teradata.db.table1 The full address of the node
ColumnName t1c1 - The name of the column
ColumnType Column -
ObjectName table1 table1 The name of the object
ObjectType Table Table The object type is usually TABLE or VIEW
GroupName db db The name of the group (database)
GroupType Database Database The group type is usually Database
ResourceName Teradata Teradata Teradata, Oracle, etc.
ResourceType Teradata Teradata Teradata, Oracle, etc.
RevisionState STABLE STABLE State of the object when comparing two revisions (NEW, DELETED, STABLE, INNER)
AttributeName datatype - The name of one of the attributes
AttributeValue integer - The value of this attribute
NodeName t1c1 table1 The name of the node
NodeType Column Table The type of the node

The order of elements in the exported files is not exactly defined. For example, it is not guaranteed that the CreateView element will precede the Delete element (delete a record from the view) and vice versa.

The columns NodeName and NodeType were added because in previous versions, not all objects in the file vertices.csv had an object type. This made it difficult for users to sort through the list of exported nodes from a Automatic Data Lineage visualization. With these two columns, it is easier to sort exported nodes by type. Therefore, these two columns are always populated in each row.

Indirect Flow Edge Categorization

Indirect flow edges contain the attribute EDGE_FILTER_TYPE describing the type of operations where the condition comes from.

Possible EDGE_FILTER_TYPE Values Grouped into Three Logical Categories

Data flow graph with indirect flow edges

Transformation Logic

Purpose

This functionality generates transformation descriptions, showing exactly how the value for a target (e.g., column, routine return value, etc.) is computed. The descriptions are stored on transformation column nodes immediately upstream of the target. The description does not include conditions under which the target is affected by the transformation. (That kind of information is provided in the form of filter edges in the lineage graph.)

This results in three types of transformation descriptions.

Supported Source Systems

Where the Transformation Description and Classification Is Shown

Text Version for Visualization

Tag with the Transformation Classification

Object Version for Export

How to Read Expressions

Each line of transformation description consists of:

TARGET := EXPRESSION_ELEMENT_1 EXPRESSION_ELEMENT_2 .. EXPRESSION_ELEMENT_n

TARGET is typically the column for which the transformation description is generated or the definition of the EXPRESSION_ELEMENT from the line for which it is a sub-line.

EXPRESSION_ELEMENT is typically the qualified name of a source column, operator, keyword, or variable defined in its sub-line.

This is followed by lines explaining the elements which can be drilled down (like variables). These sub-lines are indented and are valid only for their direct parent line.

Details:

Exceptions:

Example 1: From an Oracle Import Script

Import loan file

PARTY.BUS_NAME := CASE
    WHEN VAR_16 THEN VAR_17
    ELSE VAR_19
END
  VAR_16 := LOAN_CUSTOMER.CUSTOMER_TYPE <> IMPORT_LOAN.C_LOAN_CUSTOMER_PERSON
    IMPORT_LOAN.C_LOAN_CUSTOMER_PERSON := 'P'
  VAR_17 := COALESCE(LOAN_CUSTOMER_CORPORATE.CORPORATE_NAME, COALESCE.2, IMPORT_LOAN.C_NA)
    COALESCE.2 := LOAN_CUSTOMER.CUSTOMER_NAME
    IMPORT_LOAN.C_NA := 'N/A'
  VAR_19 := IMPORT_LOAN.C_NA
    IMPORT_LOAN.C_NA := 'N/A'

Here is some information about each line of the text version of the transformation description in the preceding example.

  1. The first line tells us that this transformation description is for column BUS_NAME from table PARTY and its value is created from a CASE statement.

  2. The conditional value for the CASE statement, where the temporal variables (VAR_16 and VAR_17) generated only for this transformation description are used. They are used when they represent a sub-expression that is too long to inline or would have a negative effect on transformation description readability. They have the format VAR_xx (which can be parametrized in an XML file).

  3. Part of the CASE statement like on line 2.

  4. End line of the CASE statement.

  5. Sub-line of line 1 (visible by indentation) that defines VAR_16.

  6. Sub-line of line 5 that defines the literal value of the constant C_LOAN_CUSTOMER_PERSON.

  7. Sub-line of line 1 that defines the value of VAR_17 as a result of calling the built-in function COALESCE. The first parameter is directly in this line, but the second and third parameters are defined in the following sub-lines.

  8. Sub-line of line 7 that defines the value of the second parameter of the function COALESCE as the column CUSTOMER_NAME.

  9. Sub-line of line 7 that defines the value of the third parameter of the function COALESCE as literal 'N/A'.

  10. Sub-line of line 1 that defines the value of VAR_19. Here the value IMPORT_LOAN.C_NA is not inlined since it is detected as a variable that can have a meaningful name, which can help with the readability of the resulting transformation description.

  11. Sub-line of line 10 that defines the value of IMPORT_LOAN.C_NA as literal 'N/A'.

Example 2: From an Oracle Function Definition Script

TEST_TL_FNC1

TEST_TL_FNC1.RETVAL := TEST_TL_FNC1.IN1 + 2
  DWH.TEST_TL_FNC1(TEST_TL_FNC1.IN1)

Here is some information about each line of the text version of the transformation description in the preceding example.

  1. The first line tells us that this transformation description is for a return value from the user-defined function TEST_TL_FNC1 and its value is created as the value of the input parameter IN1 plus 2.

  2. This line defines the input parameters inside user-defined routines since it is variable and it could be unclear from where it originates. (The same line format is used for sub-lines that define output parameters from routines. But it should be clear which type it is from context, as this line only occurs in the transformation description of the output point of the routine so the routine name matches the qualified name of the final target.)

Limitations