Get to Know Process Manager

This guide breaks down the key terminology and main on-screen components in Process Manager’s UI. Check out our Process Manager guide for more information on what Process Manager is.

Terminology for Process Manager and Its Public API

Scenarios

The keystone of all IBM Automatic Data Lineage processes is so-called scenarios. These fundamental execution units define all the operations that Automatic Data Lineage can perform. Each operation wrapped inside a scenario may be successfully run if it is correctly combined with data from the connection

defined in Manta Configurator. Some examples of scenarios are the PostgreSQL extractor scenario, the rollback revision scenario, and the repository export scenario.

Workflows

Each workflow consists of two parts.

A workflow definition is a set of phases, technologies, connections, and scenarios. It defines which scenarios should be run for which phase, technology, and/or connection, regardless of their orchestration constraints. Additionally, users may also configure environment variables for each execution unit.

Workflow Definitions

A workflow definition is a tree structure with five levels of execution units:

  1. Root

  2. Phases

  3. Technologies

  4. Connections

  5. Scenarios

To create a complete workflow definition, users must define every level; for example, Root -> Extraction -> Oracle -> My Oracle -> OracleExtractorScenario.

Furthermore, it is possible to define an empty list of execution sub-units at any level. For example, users may define the extraction phase without defining a list of technologies. This will create a workflow that executes all extraction scenarios for all configured connections of all technologies.

When creating the workflow by public API, you can use wildcards. A common use case for wildcards is when users want to execute one particular scenario for all configured connections of all technologies. Manta Process Manager knows two wildcards.

Whenever users use these wildcards, they must remember to define the scenarios because the technology and connection are determined from these scenarios.

Execution wildcards can only be used when you create the workflow by public API.

In summary, to create a workflow definition, users may choose from three approaches.

Note: The technology for Automatic Data Lineage scenarios must be called “MANTA”, and the connection name must be an empty string.

Environment Variables

At each level (root, phase, technology, connection, scenario), it is possible to define a set of environment variables that are applied to every node in the subtree. If there is, for example, a specified JAVA_OPTS environment variable for the technology MS SQL at the technology tree level, all connections that are associated with this technology will inherit this variable too. In the same way, scenarios will inherit this environment variable from their connection parent nodes as well.

Process Manager defines some environment variables that alter parallelism. Read more in the section on Parallelism in Process Manager.

Workflow Information

A workflow ID is the name and identifier of an existing workflow. It is a string, and it must be unique. It can also be referenced as a workflow name.

For each workflow, users may define a maximum number of parallel processes. Running Automatic Data Lineage may consume a lot of RAM, so to prevent unnecessary failures, users may limit the number of parallel processes. If a field is not specified, the default value is used instead. Read more about parallelism in the section Parallelism in Process Manager.

In some specific use cases, it is not desirable to automatically add necessary processing scenarios to the workflow definition. To suppress this functionality, users may create an advanced workflow by setting the advanced flag to true. The default value is false.

Workflow Templates

Workflow templates are workflows predefined by Automatic Data Lineage. These workflow templates cannot be changed or executed directly. The templates are to be used as references for the user’s custom workflows. The most effective way to use them is to copy the definition into a custom workflow and save it as a new workflow.

List of Workflow Templates

All the workflow templates have been created from well-known Manta CLI scripts, namely:

Process Manager: GUI

This chapter describes the Process Manager component of Manta Admin UI, shows its screens, and explains how it is used.

Process Manager Dashboard

The main screen of Process Manager is the dashboard screen, which can be seen in the following screenshot. This screen consists of three main elements: the workflow queue, workflow history, and the blue Execute Workflow button. This button may be hidden if the left panel is collapsed.

Workflow Queue

The main purpose of the workflow queue is to track pending and running workflows. These workflows are organized in execution order. The one at the beggining is being executed or will be executed next.

Each line represents one workflow execution. You can identify the workflow by its name in the first column 1.

Next to the Workflow Name column is the execution type 2. This may display two values, Full Run or Incremental Run. These values indicate what kind of revision is created during the workflow execution: if a new revision is created during the execution or if a minor revision is used instead. This value is extracted from the workflow definition, and if the workflow doesn’t contain a new revision scenario or a new minor revision scenario, it is possible that no revision will be created in the end. For clarification, this information does not indicate whether any revision will be created.

Process Manager - Workflow queue

The third column 3 shows the current execution status. For all possible statuses, see the preceding chapter called Execution States. Next to the status, you can see the progress represented by a percentage. This value indicates how many scenarios are finished out of the total number; it doesn’t say anything about the size or length of time of the scenarios. The file icon next to the percentage is a shortcut to the Manta Log Viewer component, which displays logs related to this execution. You may find similar icons all over the place. Each of them redirects you to a specific level of detail in the Log Viewer.

To the left of the workflow name is an expand arrow that shows the Running Workflow Progress section with detailed information about the workflow execution. This section is divided into three levels of detail, and for each level you can find the state 7 of each of the three phases.

Every expandable parent, just like the technology or connection, shows the statuses calculated from its children. For example:

All the queued workflow executions may be stopped. As mentioned above, Process Manager distinguishes between two ending execution approaches: Cancel and Stop (or terminate). The Cancel option is available for all pending workflow executions and is considered to be a safe action since no file system changes have been made and no processes are currently running for this execution. On the other hand, the Stop action is available for executions that are currently being executed, but this operation is NOT considered to be safe.

Early termination of the process may cause issues with the next workflow execution and a manual interference may be required (commit/rollback open revision). Should the workflow termination result in an inconsistent state in the Automatic Data Lineage repository, Automatic Data Lineage will automatically execute a correction operation itself. This typically means committing the revision that the terminated workflow left open. As a result, a new (incomplete) revision of the data lineage metadata will be available in Manta Viewer. This revision can later be deleted, but the deletion can take a significant amount of time.

Workflow History

Following the Workflow Queue table is a Workflow History table. This table shows the history of all finished workflow executions with columns similar to those in the workflow queue. For example, the finished status 1 can be seen here as well, but the Log Viewer shortcut button that was next to the status has been moved to its own column 6. The Started column shows the time when the real execution began, which is not the time when the execution was placed in the queue. For this reason, this time may be empty 2 if the workflow never began for some reason such as if it was cancelled. The Finish column shows the time when the workflow finished and its state was resolved — changed for the last time. As with the workflow queue, it is possible to expand the execution to see detailed information.

Process Manager knows five end statuses for workflow execution: Finished Successfully, Finished with Error, Failed, Terminated, and Canceled. A workflow may finish with:

Those records are ordered by the Added to Waiting Queue time; that is, according to when the user clicked to execute the workflow. This time is currently only available in the public API.

Process Manager - Workflow history

Scenario Statistics

In the Workflow History table, a user can also review scenario success statistics for certain analytical scenarios. Success statistics are

defined by:

The scenario statistics are displayed in the Analysis column of the Workflow History table (1 and 3) for each of the valid scenarios. The number displayed in the column is the statement success percentage. Upon hovering over the number, a tooltip is displayed 2, containing additional statistics as defined above.

A success rate of 100.0% 3 is only returned when the success rate is really 100.0%. (The success rate is never rounded up to 100.0%.)

A success rate of \<100% 1 does not necessarily mean poor quality of data lineage nor that it may be incorrect— the number may be affected by statements that are not producing any lineage.

If an analytical scenario is executed for an empty input, no statistics are provided.

Statistics are provided for individual scenarios only — the data does not get aggregated to higher levels such as workflows.

Scenario statistics are processed as an event in Manta Event System. This means that it is possible to consume the scenario statistics from it using one of the external connectors and implement custom integrations on it (e.g., the success rate goes below a certain threshold).

No alt text provided

Valid Scenarios

Scenario statistics are currently only provided for certain scenarios. Here is a full list of supported scenarios.

Left Drawer Menu

It is possible to see existing workflows 2 and search between them 1.

Clicking on a particular workflow 2 will redirect you to a workflow detail screen.

Following the list of existing workflows is a + New Workflow button 3, which redirects to a screen for creating new workflows.

At the bottom of the left drawer menu is a blue Execute Workflow button 4, which opens the Execute Workflow/Scenario modal window.

No alt text provided

Workflow Execution

Process Manager provides a way to execute workflows, workflow templates, and scenarios.

Workflow Execution Modal Window

When the blue button in the lower-left corner is pressed, the following modal window pops up. When the first tab labeled Workflows 1 is active, this window can be used to execute predefined workflow templates 5 or custom workflows 5, if any exist.

Executing a workflow template 4 will create a new custom workflow 5 with the same name as the template that will be executed. This newly created workflow is called a workflow template instance, and whenever the same template is executed again, this existing workflow template instance is executed instead.

Process Manager - Execute workflow

Every workflow displayed here, whether it be a custom workflow or a workflow template, consists of two pieces of information: the workflow’s name and the workflow’s description, as can be seen in the figure above. The description may give you a better understanding of what will happen if the workflow is executed.

Users can provide additional inputs for executing a custom workflow: a zip file is expected containing an Input folder with the same hierarchy as when inputs are provided directly via filesystem.

No alt text provided

The Scenarios tab provides the option to execute a particular scenario 8. The scenario can be chosen from the list of available scenarios, which only displays the scenarios for the connections that are configured. Executing a scenario here will execute this scenario for the chosen technology for all its existing connections.

Process Manager - Execute scenario

For better navigation, these scenarios are grouped by technology 7 and the technologies are grouped by technology categories 6 just as they are in Manta Configurator. Alternatively, the search field 3 may be used as a faster way to find a particular scenario (or workflow on the first tab).

A scenario can be executed in a manner similar to a workflow. Executing a scenario creates a scenario workflow template, and when executing the same scenario a second time, the existing scenario workflow template is reused.

Workflow Detail

When you click on any workflow in the left drawer menu, a workflow detail screen appears. The layout of this screen is the same as the workflow creation screen.

The workflow detail screen is divided into a header and three sections. The layout of the left drawer is the same on this screen.

The header contains the name of the workflow 1 and three buttons: Back, Delete, and Edit. The Back button 5 returns you to the dashboard, the Delete button 6 allows you to delete a workflow after the confirmation modal, and the Edit button 7 redirects you to the edit workflow screen, which is the same as the create workflow screen. The only difference is that the workflow’s name cannot be edited there.

Following the workflow detail header panel are three sections: Properties, Workflow Designer, and Workflow History. Each of them has its own section here on this page of the Knowledge Base.

The Properties section 2 contains basic information about the workflow such as its name, description, and more.

Workflow Designer 3 shows visualized workflow definitions.

Workflow History 4 shows historical executions of the selected workflow.

No alt text provided

Properties

In the properties section is a Workflow Name 1 and workflow Description 2 that users can define when creating a property.

Next on the list is Revision Type 3. It says whether this workflow will create a new revision or a new minor revision.

Note: From version R42.5, the Revision Type is not available when Targeted scanning is enabled.

Below it is Advanced Mode 4. If Advanced Mode is disabled, Process Manager automatically injects all necessary Manta scenarios into the workflow at runtime. These include the diagnose repository scenario, the new revision scenario, the commit scenario, and so on.

The last item 5 is the maximum number of processes that can run simultaneously.

No alt text provided

Workflow Designer: View Mode

Workflow designer visualizes workflow definitions — what is executed. It is important to mention that this doesn’t show when each item is executed. The execution order is determined just before execution by the workflow planner. Anyway, some execution constraints are displayed here. For example, the phases are displayed in execution order: Extraction, Analysis, and Export.

The workflow designer is displayed as a tree structure. The root of the tree is not represented as a node but rather as the whole blank area below the nodes. To access the root, you have to deselect all the nodes so none of them will be highlighted in blue, as shown in the next screenshot.

The root contains up to three phases 1. Each phase contains technologies 2, each technology contains connections 3, and each connection contains scenarios 4.

If the advanced flag is disabled, then, as was mentioned, Process Manager automatically determines the necessary Manta scenarios. These are displayed here 4 as gray nodes, solely for informative purposes. They are not editable and they will not appear in the workflow

definition if it is obtained through the API.

As was previously explained, if a node doesn’t have children, Process Manager executes all possible child combinations based on the existing configuration. These children-empty nodes are visualized with rotating arrows 6.

If you select any node (root is selected if no other node is selected), an additional configuration for the selected node appears in the right menu 7. Read more about this configuration in the section on workflow creation.

No alt text provided

Workflow History

Workflow history uses the same table as the dashboard screen. This table is filtered so it displays only the selected workflow.

No alt text provided

Template and Scenario Instances

If a scenario or template is executed, a scenario instance workflow or template instance workflow is created. These workflows are also displayed in the left drawer and their details may be accessed. The label near the workflow name 1 says whether the selected workflow is a scenario or template instance. If this label is missing, it is a regular user-created workflow. These workflows are not editable, so the Edit button 2 is disabled.

No alt text provided

Outputs Generated by Export Scenarios Executed in Process Manager and Their Limitations

Workflows that include exports are expected to generate output files that can be downloaded. Those files will be available in an icon link under the Outputs column in the Workflow History.

No alt text provided

Once the icon link is clicked the Process Manager will generate the ZIP file and provide a link to start the download.

No alt text provided

All export scenarios (in most cases both export and upload, e.g. Alation, IGC, EDC, or export and dictionary export, e.g. Open Manta Integration Export) must be included in one workflow due to the location having the workflowExecutionId used in the output file path. The first scenario prepares files for export in the directory ${CLI_HOME}/temp/processmanager/N/, and the next scenario will try to find these files in ${CLI_HOME}/temp/processmanager/N+1/ as its execution ID will be different from the first scenario.

Download Endpoint

Alternatively, the output file can be downloaded using the Orchestration API endpoint HTTP GET /manta-admin-gui/public/process-manager/v1/executions/{executionId}/output where {executionId} refers to the Execution ID for the specific workflow.

Workflow Information File

Every Output ZIP file has a file called workflow_information.json which contains important information about the executed workflow.

No alt text provided

Where: