Manta Orchestration API
Manta Flow Orchestration API provides all the capabilities needed to support Manta Admin User Interface as well as the configuration and management functions for external applications and integrations. It supports three main areas of functionality — configuration,
process management, and license management — as described below. See the OpenAPI (Swagger) page provided by your IBM Automatic Data Lineage instance (available at
http://<manta_admin_gui_hostname>:<manta_admin_gui_port>/manta-admin-gui/swagger-ui/index.html) for additional information.
-
Configuration — The configuration component allows the admin to manage the connections to the source systems via an API. The API endpoints grouped under
http://<manta_admin_gui_hostname>:<manta_admin_gui_port>/manta-admin-gui/swagger-ui/index.html?urls.primaryName=%20%20%20manta-orchestration-api#/Connectionsprovide the CRUD operations for connections, allow the enabling/disabling of specific connections, and also provide additional information such as connection “templates” for specific technologies supported by Automatic Data Lineage. -
Process management — Process manager endpoints make it possible to manage workflow definitions and their executions. The API endpoints grouped under
http://<manta_admin_gui_hostname>:<manta_admin_gui_port>/manta-admin-gui/swagger-ui/index.html?urls.primaryName=%20%20%20manta-orchestration-api#/Workflow%20Definitionsprovide the CRUD operations for workflow definitions as well as the workflow definition “templates” that illustrate how workflow definitions can be used. Separately, for the execution of existing workflows, verifying the status of the execution, and acquiring outputs of the workflows (typically only workflows containing export steps), the endpoints grouped underhttp://<manta_admin_gui_hostname>:<manta_admin_gui_port>/manta-admin-gui/swagger-ui/index.html?urls.primaryName=%20%20%20manta-orchestration-api#/Workflow%20Executionscan be used. -
License management — License management endpoints make it possible to manage the Automatic Data Lineage license and get license statistics via an API. The API endpoints grouped under
http://<manta_admin_gui_hostname>:<manta_admin_gui_port>/manta-admin-gui/swagger-ui/index.html?urls.primaryName=manta-license-apiprovide read and update operations for the license and read operations for license statistics. -
Log management — Log management endpoints make it possible to download the logs from Automatic Data Lineage scan executions via an API. The API endpoints are grouped under
http://<manta_admin_gui_hostname>:<manta_admin_gui_port>/manta-admin-gui/swagger-ui/index.html?urls.primaryName=%20%20%20manta-orchestration-api#/Logging.
Usage Examples
A detailed description of the individual endpoints is available on the Swagger page. This document, on the other hand, demonstrates how to work with the API in the following example scenarios.
Execute a Workflow via Orchestration API and Provide an Input File
The examples were written using the curl command line utility but can be executed using other tools as well (after properly adjusting the parameters).
Upload a File as Part of the Execute Request
curl -X 'POST' '<mantaAdminUiUrl>/manta-admin-gui/public/process-manager/v1/executions?workflowDefinitionName=<workflowName>'
-H 'accept: */*'
-H 'Authorization: Bearer <token>'
-H 'Content-Type: multipart/form-data'
-F 'inputs=@<zipName|zipPath>;type=application/zip'
-
<zipName|zipPath>— a filename or a filename including a path to theinput.zipfile on the system that the cURL is triggered from -
<mantaAdminUiUrl>— the actual Admin UI URL, usually something likehttp://localhost:8181 -
<workflowName>— the name of the workflow to execute -
<token>— the authorization token obtained from login
Execute the Workflow and Reference an Existing File on Manta Server
curl -X 'POST' '<mantaAdminUiUrl>/manta-admin-gui/public/process-manager/v1/executions?workflowDefinitionName=<workflowName>&inputsPath=<pathToInputDirectory>'
-H 'accept: */*'
-H 'Authorization: <token>'
-H 'Content-Type: application/json'
-
<pathToInputDirectory>— an absolute path that points to a directory containing aninputfolder with the same structure as the zip file -
<mantaAdminUiUrl>— the actual Admin UI URL, usually something likehttp://localhost:8181 -
<workflowName>— the name of the workflow to execute -
<token>— the authorization token obtained from login
General Walkthrough Example (Create a Connection and Execute It)
This is a general scenario in which the user configures a freshly installed Automatic Data Lineage instance and executes the first data lineage scans. The purpose of this scenario is to show how specific endpoints are expected to be used.
Scenario description:
-
The user needs to create a new connection (called
mssql_example) to an MS SQL database running on theexample_hostnameserver, port 1443. Only the schemadwhfrom the databasedatabase1should be analyzed. The analysis of transformation logic expressions should be turned on for this connection. Similarly, a new connection for Oracle (calledoracle_example) should be created. -
Once the
mssql_exampleandoracle_exampleconnections are defined, a data lineage scan should be executed. -
Upon reviewing the data lineage generated by Automatic Data Lineage, it turns out that the data lineage between the two databases is missing. It is necessary to also create a connection for SSIS workflows (called
ssis_example) which moves the data between the Oracle and MS SQL databases. -
Once the
ssis_exampleconnection is defined, a new data lineage scan should be executed. In this case, however, the user does not want to analyze the databases a second time, as it is time-consuming and unnecessary. (The databases are in the same state as before the first scan was executed; no code has been changed, making it possible to analyze the data lineage from the newly defined ETL tool on top of the database resources that have already been analyzed.)
Configuration steps:
-
Create
mssql_exampleandoracle_exampleconnection definitions:-
First, the user needs to find out which properties can/should be configured for the connection. The connection definition format is technology specific. To get the connection definition template for MS SQL, the user can utilize the following API endpoint:
[GET] http://<manta_admin_gui_hostname>:<manta_admin_gui_port>/manta-admin-gui/public/configurator/v1/technologies/{technology}/connectionTemplate. (The constants used to identify the technology can be obtained using the API endpoint[GET] http://<manta_admin_gui_hostname>:<manta_admin_gui_port>/manta-admin-gui/public/configurator/v1/technologies.)
If there is amssql_exampleconnection, the user needs to configure the analysis of the transformation logic. Since this is considered to be an advanced configuration, the property is not shown in the connection definition template by default — theincludeAdvancedPropertiesflag has to be set totruefor this to be displayed. The resulting GET URL of the request would thus behttp://<manta_admin_gui_hostname>:<manta_admin_gui_port>/manta-admin-gui/public/configurator/v1/technologies/MSSQL/connectionTemplate?includeAdvancedProperties=true&includeReadOnlyProperties=false.
The response then contains all the (editable) properties; the user can choose to configure only their sub-set. In this case (mssql_example), these properties would bemssql.dictionary.id,mssql.subdialect,mssql.url,mssql.username,mssql.password,mssql.extractedDbsSchemas,mssql.expressionDescriptions.enabled.ddl, andmssql.expressionDescriptions.enabled.script. -
Once it is clear which properties should be configured, it is possible to create the connection. The following API endpoint can be used for that:
[POST] http://<manta_admin_gui_hostname>:<manta_admin_gui_port>/manta-admin-gui/public/configurator/v1/connections/{technology}. In this case, thetechnologyattribute should beMSSQLand the request body containing the correct values for the properties listed above (and default/empty values for the rest of the required properties) should be sent. -
The same steps can be taken to create the Oracle connection
oracle_example.
-
-
Execute a data lineage scan:
-
First, a new workflow defining what will be executed needs to be created. Automatic Data Lineage provides several predefined templates that illustrate how simple workflow definitions can look. The templates can be listed using the API endpoint
[GET] http://<manta_admin_gui_hostname>:<manta_admin_gui_port>/manta-admin-gui/public/process-manager/v1/workflow/templates. In this scenario, it is necessary to execute the metadata extraction and data lineage analyses for both created connections. The template calledrundoes exactly this (and additionally, an export is executed at the end of the workflow). The detailed definition of this template can be shown using the API endpoint[GET] http://<manta_admin_gui_hostname>:<manta_admin_gui_port>/manta-admin-gui/public/process-manager/v1/workflow/templates/{workflowDefinitionTemplateName}(where{workflowDefinitionTemplateName}is the name of the template; in this example the valuerunshould be used). The returned definition can be used to create the new workflow, but before that, some automatically generated attributes have to be deleted (name, created, createdBy, updated, updatedBy). Also, since the export phase is not needed, it can be dropped as well. The resulting workflow definition is shown below. A new workflow with this definition can be created via the API endpoint[POST] http://<manta_admin_gui_hostname>:<manta_admin_gui_port>/manta-admin-gui/public/process-manager/v1/workflows. (It is also necessary to specify the name of the workflow as a parameter; in this example, the namewf1will be used.){ "description": "Custom workflow wf1.", "workflowDefinition": { "revisionType": "MAJOR", "phases": [ { "phase": "EXTRACTION", "technologies": [] }, { "phase": "ANALYSIS", "technologies": [] } ], "environmentVariables": {}, "advanced": false, "maxParallelScenarios": 4 } }Note: It is possible to define an empty list of execution sub-units at any level. For example, users may define the extraction phase without defining a list of technologies. This will create a workflow that executes all extraction scenarios for all configured connections of all technologies. It is even possible to specify any phase to run all of them. See Manta Process Manager for more details.
-
The workflow
wf1can now be executed via the API endpoint[POST] http://<manta_admin_gui_hostname>:<manta_admin_gui_port>/manta-admin-gui/public/process-manager/v1/executions. The workflow execution ID is returned and can be used to observe the status of a running workflow via the API endpoint[GET] http://<manta_admin_gui_hostname>:<manta_admin_gui_port>/manta-admin-gui/public/process-manager/v1/executions/{executionId}/status. Once all the workflow execution steps are successfully finished, the lineage is available in Manta Flow Viewer. -
The SSIS connection
ssis_examplecan be created by following the same instructions as themssql_exampleandoracle_exampleconnections in the first step. -
A new workflow needs to be created for SSIS extraction and analysis. The workflow definition below restricts both of these phases to the technology
SSISand connection namessis_example, ensuring that only this connection will be used in the workflow. Also, in this case, it is desirable to create a minor revision (as opposed to the major revision created by the execution of workflowwf1) so both the lineage from the first execution and the lineage generated during the execution of this second workflow will be available in the Manta Flow Viewer once the workflow execution finishes.
{ "workflowDescription":"Custom workflow wf2 scanning only 'ssis_example' connection and creating MINOR revision..", "workflowDefinition":{ "revisionType":"MINOR", "phases":[ { "phase":"EXTRACTION", "technologies":[ { "technology":"SSIS", "connections":[ { "connectionID":"ssis_example", "scenarios":[] } ] } ] }, { "phase":"ANALYSIS", "technologies":[ { "technology":"SSIS", "connections":[ { "connectionID":"ssis_example", "scenarios":[] } ] } ] } ] } } -
(OEM) Integration Example
This scenario describes how Orchestration API can be used to integrate Automatic Data Lineage and another tool such as a data catalog. The data catalog can then connect to the source database systems, extract their structures (database dictionaries), and perform data profiling. The data catalog does not have any data lineage analysis capabilities, thus, the goal is to analyze the data lineage with Automatic Data Lineage and ingest the data lineage into the data catalog in a way that the same objects coming from different sources (Automatic Data Lineage and the data catalog) are mapped correctly. There are no new objects created for the database objects exported by Automatic Data Lineage. They are mapped to the ones that already exist in data catalog instead.
Scenario description:
The customer’s environment consists of an MS SQL database, an Informatica PowerCenter ETL tool, and one (1) Teradata database.
Teradata contains several databases with stored procedures, macros, and views. External BTEQ and TPT scripts move data within the tables of this database.
MS SQL contains several databases and schemas with stored procedures and views. External scripts move data within the tables of this database.
Informatica PowerCenter contains workflows that move data from the MS SQL database to Teradata. There are BTEQ scripts being called from Informatica PowerCenter workflows.
The goal is to get the exact data lineage for all three source systems.
The endpoints that will be used for this scenario are the same as those shown in the first example. The key difference here is that when integrating with another system, the individual operations have to be orchestrated much more precisely — it is critical that the processes are ordered correctly. The following steps demonstrate, what an OEM integration process can be like — the end-user only interacts with the second system (data catalog), and consequently, all actions in Automatic Data Lineage are executed by the data catalog behind the scenes. A scenario in which the end-user interacts with both Automatic Data Lineage and the data catalog would be exactly the same, only some of the steps that are performed by the data catalog below would be performed directly by the user.
Configuration steps:
-
The user creates connection definitions for all the source systems in the data catalog.
-
The data catalog automatically creates connection definitions in Automatic Data Lineage (providing Automatic Data Lineage with the same information the user entered into the data catalog).
-
The user (or a scheduler) executes the extraction of the source systems in the data catalog.
-
Immediately after any source system extraction is executed in the data catalog, the data catalog should execute the extraction of the same system in Automatic Data Lineage. That includes the creation of a new workflow that will only perform the extraction of the connection that corresponds to the source system (see step #4 in the first example) and the execution of this workflow (see step #2 in the first example). The extraction processes for different source systems are independent of each other; they can be executed in arbitrary order. The extraction of one source system in Automatic Data Lineage, however, should be bound to the extraction of the same source system by the data catalog. Both processes should be executed with a minimum time delay so that the source system contains the same object definitions (e.g., tables) when both Automatic Data Lineage and the data catalog read them. If this is not ensured, it is possible that an end-user will change the source system (e.g., alter the table definitions) in the meantime, which will cause inconsistencies between Automatic Data Lineage and the data catalog metadata.
-
Once (and only once) all the source systems are extracted (by both Automatic Data Lineage and the data catalog), the data lineage analysis can be executed. It can be bound to the execution of a process in the data catalog (e.g., the execution of profiling), or it can be explicitly executable in the data catalog. Once the data catalog receives the instruction from the user to start the analysis, a new workflow for the analysis should be created (if it has not already been created). The data lineage analysis has to be done for all the source systems at once if the end-to-end lineage is to be analyzed because there are dependencies between the systems. For example, in order to properly analyze the lineage for the BTEQ scripts that are called by Informatica PowerCenter, Automatic Data Lineage first has to know the structures and internal lineage of the databases from/to which these BTEQ scripts read/write data. The Automatic Data Lineage workflow definition should thus contain an analysis of all the connections in Automatic Data Lineage that were created by the data catalog. It is also possible to include the export of the data lineage in the workflow. In this case, it will be executed right after the analysis finishes. To include the export, add the
xxxOpenExportScenarioscenarios to the workflow definition. An example of such a workflow is shown below. Once the workflow definition is created, the workflow can be executed.{ "workflowDescription":"OEM integration analyses/export workflow.", "workflowDefinition":{ "revisionType":"MAJOR", "phases":[ { "phase":"ANALYSIS", "technologies":[ { "technology":"MSSQL", "connections":[ { "connectionID":"mssql_example", "scenarios":[] } ] }, { "technology":"Teradata", "connections":[ { "connectionID":"teradata_example", "scenarios":[] } ] }, { "technology":"IFPC", "connections":[ { "connectionID":"ifpc_example", "scenarios":[] } ] } ] }, { "phase":"EXPORT", "technologies":[ { "technology":"MSSQL", "connections":[ { "connectionID":"mssql_example", "scenarios":[ {"scenarioName": "mssqlOpenExportScenario"} ] } ] }, { "technology":"Teradata", "connections":[ { "connectionID":"teradata_example", "scenarios":[ {"scenarioName": "teradataOpenExportScenario"} ] } ] }, { "technology":"IFPC", "connections":[ { "connectionID":"ifpc_example", "scenarios":[ {"scenarioName": "ifpcOpenExportScenario"} ] } ] } ] } ] } } -
Once the workflow execution finishes, the data catalog can download the exported lineage via the API endpoint (more info in API Output Download) and ingest the data lineage.
SSIS Variable Overrides Setting
This scenario describes how Orchestration API can be used to set the SSIS Variable Overrides configuration. During the runtime of SSIS jobs, variables and parameters are read from the database. However, Automatic Data Lineage lineage analysis may encounter limitations as it doesn't have access to these runtime values and uses the default values defined in the jobs instead. This can lead to incomplete or unexpected lineage results. To address this, define manual override values for the variables and parameters to be used specifically during lineage analysis. This override becomes mandatory when there is no default value available for Automatic Data Lineage to utilize.
Configuration steps:
-
For accomplishing this task, we will be using the
http://<manta_admin_gui_hostname>:<manta_admin_gui_port>/manta-admin-gui/public/configurator/v1/tabular/API endpoint with the {application}, {category}, and {configuration_name} set tocli/SSIS/SSIS%20Variable%20Overrides. -
Once the URL has been properly defined, the JSON body will need to be configured using the following format:
{ "entries": [ { "values": [ { "propertyKey": "Scope mask", "propertyValue": "Project/Package.dtsx" }, { "propertyKey": "Qualified Name", "propertyValue": "User::AsdfSqlDbConnection" }, { "propertyKey": "Mode (DEFAULT|OVERRIDE)", "propertyValue": "OVERRIDE" }, { "propertyKey": "Data Type", "propertyValue": "STRING" }, { "propertyKey": "Value", "propertyValue": "Data Source = asdf" } ] }, { "values": [ { "propertyKey": "Scope mask", "propertyValue": "Project/Package.dtsx" }, { "propertyKey": "Qualified Name", "propertyValue": "User::AsdfSqlDbConnection" }, { "propertyKey": "Mode (DEFAULT|OVERRIDE)", "propertyValue": "OVERRIDE" }, { "propertyKey": "Data Type", "propertyValue": "STRING" }, { "propertyKey": "Value", "propertyValue": "Data Source = asdf" } ] } ], "configurationType": "TABULAR" } }
-
Once the request has been submitted, a
Status 200message response should be received and the previous SSIS Variable Override settings (if there were any) will now be set to the values provided in your API call.
API Authentication
Token-based authentication is used. Check out our guide on API Token-Based Authenticationto learn how to set it up.
cURL Requests
API endpoints can be queried using the cURL application (among other methods). This can be especially useful when executing workflows from external schedulers. For example, the following command can be used to execute a workflow.
curl -X POST "http://<manta_admin_gui_hostname>:<manta_admin_gui_port>/manta-admin-gui/public/process-manager/v1/executions?workflowDefinitionName={workflowDefinitionName}" -H "Authorization: Basic {basic_authentication_hash}"
And this command can be used to verify the status of the execution.
curl -X GET "http://<manta_admin_gui_hostname>:<manta_admin_gui_port>/manta-admin-gui/public/process-manager/v1/executions/{executionId}/status" -H "Authorization: Basic {basic_authentication_hash}".
API Output Download
There are two ways to download attachments.
-
Using the endpoint
[GET] http://<manta_admin_gui_hostname>:<manta_admin_gui_port>/manta-admin-gui/public/process-manager/v1/executions/{executionId}/output. This is the easiest way, but if the output data is large in size, it can cause performance problems. For example, the download may not succeed, the server may crash, etc. The endpoint returns the data as a binary list.curl --output output_file.zip -X GET "http://<manta_admin_gui_hostname>:<manta_admin_gui_port>/manta-admin-gui/public/process-manager/v1/executions/{executionId}/output" -H "Authorization: Basic {basic_authentication_hash}"Or alternatively:
curl -X "GET" "http://<manta_admin_gui_hostname>:<manta_admin_gui_port>/manta-admin-gui/public/process-manager/v1/executions/{executionId}/output" -H "accept: */*" -H "Authorization: Bearer {Bearer_Token}" --output output.zipPowerShell script:
# PowerShell Version ## Define the endpoint URL $endpoint = ' http://<keycloak_gui_hostname>:<keycloak_gui_port>/auth/realms/manta/protocol/openid-connect/token'# Define the request parameters $params = @{ 'grant_type' = 'client_credentials' 'client_secret' = '<insert_your_client_secret_code>' 'client_id' = '<insert_your_client_id>' } # Get Token $token = Invoke-RestMethod -Uri $endpoint -Method Post -Headers $headers -Body $params $token $accessToken = $token.access_token $accessToken #Output DL $url = " http://<manta_admin_gui_hostname>:<manta_admin_gui_port>/manta-admin-gui/public/process-manager/v1/executions/{executionId}/output" $headers = @{ 'accept' = '*/*' 'Authorization' = "Bearer $accessToken" } Invoke-RestMethod -Uri $url -Method Get -Headers $headers -OutFile "output.zip" -
Using the endpoint
[GET] http://<manta_admin_gui_hostname>:<manta_admin_gui_port>/manta-admin-gui/public/process-manager/v1/executions/{executionId}/output-stream. This method is suitable for downloading larger outputs because it can more efficiently send data from the server to the client. The endpoint returns the data as a zip stream.curl --output output_file.zip -X GET "http://<manta_admin_gui_hostname>:<manta_admin_gui_port>/manta-admin-gui/public/process-manager/v1/executions/{executionId}/output-stream" -H "Authorization: Basic {basic_authentication_hash}"Here is an example of how to implement a zip stream download using Java.
-
An executable jar is attached in the file
manta-output-downloader.jar.List all the files from the output zip to the console.
java -jar manta-output-downloader.jar -host http://<manta_admin_gui_hostname>:<manta_admin_gui_port>/manta-admin-gui -executionId $executionId -password $password -username $userNameDownload the file to the
/tmpfolder.java -jar manta-output-downloader.jar -host http://<manta_admin_gui_hostname>:<manta_admin_gui_port>/manta-admin-gui -executionId $executionId -password $password -username $userName -downloadFolder /tmp -
The Maven project is attached in the file
manta-output-downloader.zip. -
The Java code is attached in the file
MantaOutputDownloader.java.
-
MantaOutputDownloader.java
manta-output-downloader.zip
manta-output-downloader.jar