Manta Flow Collibra Client Configuration

There are several essential configuration units the user has to set up (or retain the defaults in some cases) so Manta Data Lineage can export data lineage to Collibra DGC. These configuration units are described in the following sections. All these configuration units can be updated using the Manta Configurator web application.

Collibra DGC Connection

A connection to Collibra DGC must be set up. This configuration is available on the Manta Admin UI screen under Configurations >Integrations > Collibra > Collibra Export. The configuration contains the following properties.

Property name Description Example
collibra.exportMode Configure the REST for the direct Manta-to-Collibra-DGC integration. REST
collibra.uploadMetadata.protocol A protocol used for the connection to Collibra DGC API; the options are HTTP and HTTPS. https
collibra.uploadMetadata.serverName Host including the domain name or IP address of Collibra DGC. example.collibra.com
collibra.uploadMetadata.portNumber Port number on which Collibra DGC API is available. If there's no port number in the URL, use the default port numbers (80 for HTTP, 443 for HTTPS).
collibra.uploadMetadata.userName Collibra DGC username with privileges for the REST API and the communities that the metadata upload will target. admin
collibra.uploadMetadata.password Collibra DGC password. password
collibra.uploadMetadata.path Collibra DGC API v2 location; not to be changed. rest/2.0
collibra.uploadMetadata.segmentSeparator Character used to separate segments of qualified asset names; space characters need to be escaped by preceding them with a backslash \; if the backslash \ itself is part of the segment name, it should be escaped with three additional backslashes resulting in a \\\\ sequence >
collibra.uploadMetadata.assetStatus Status to be set for each asset exported to Collibra; the value is not validated against Collibra DGC settings. Accepted
collibra.uploadMetadata.importApi.batchSize Batch size of requests to the import API; the batch size is the number of objects (domains, assets, complex relations) that will be loaded into Collibra DGC in one batch; the recommended value range is between 5,000 and 10,000. 10000
collibra.uploadMetadata.importApi.singleSynchId Set to true if a single synchronization ID should be used for all communities (requires Collibra DGC 5.6.2 or newer); set to false if a specific synchronization ID should be used for each community. Setting this value to false may have a negative impact on performance as the batches of metadata sent over to Collibra are split per community and therefore may not be utilizing the full batch size configured in collibra.uploadMetadata.importApi.batchSize. false
true
collibra.uploadMetadata.importApi.exportExternalMappings Set to true if external mappings have to be exported; otherwise, set to false. false
true
manta.collibra.useTrustStore Set to true to use a custom truststore with the certificate used by Collibra DGC for HTTPS connections; set to false to use Java's default truststore or when an HTTP connection is used. true
false
manta.collibra.exportTransformationLogic Flag whether or not to export the transformation logic (calculation formulas) for an individual field into complex relations (field mappings). false
true
manta.collibra.exportRevisionNumber Flag whether or not to export the revision number of the nodes in Manta Data Lineage . false
true
manta.collibra.exportExportDate Flag whether or not to export the date and time the export started false
true
manta.collibra.useIntegrationApi (valid up to an including Manta Data Lineage R42.1) Flag whether to use Synchronization API or Integration API. If set to false (default), Synchronization API is used. Synchronization API takes care of automated deletion of assets and flows that are no longer exported or available as part of the exported revision. There are, however, performance constraints on the Collibra side that may make this export mode unusable. Integration API only imports new items. It does not handle the deletion of objects and flows that are no longer exported or available as part of the exported revision. This export mode is much faster compared to Synchronization API. Note the section entitled Notes on Using Integration API. false
true
manta.collibra.useIntegrationApi (updated as of Manta R42.2) Synchronization mode for assets that no longer exist in the external system. If Change Missing Asset Status is selected, indicate new status UUID which you can find on Collibra operation-model page. Default value (Obsolete) UUID: 00000000-0000-0000-0000-000000005011. Synchronization API is used. Synchronization API takes care of updating the status of assets and flows that are no longer exported or available as part of the exported revision. If Delete is selected, Synchronization API is used. Synchronization API takes care of automated deletion of assets and flows that are no longer exported or available as part of the exported revision. There are, however, performance constraints on the Collibra side that may make this export mode unusable. If No Synchronization is selected, Integration API is used. Integration API only imports new items. It does not handle the deletion of objects and flows that are no longer exported or available as part of the exported revision. This export mode is much faster compared to Synchronization API. See the section below entitled Notes on Using Integration API. Change Missing Asset Status
Delete
No Synchronization
manta.collibra.continueOnError (as of Manta Data Lineage R42.1) Flag whether to continue on errors. If set to true (default), errors that occurred during export to the Collibra will be ignored and the export result will be successful. The result of unsuccessful operations/mapping will be visible in the Collibra and also written out in the log file. false
true
manta.collibra.integrateWithCollibraEdge (as of Manta Data Lineage R42.2) If true, the Segment separator configuration option will be hidden, and the value will be set to >. If true, the Export external mappings configuration option will be hidden, and the value will be set to false. If true, the suffix (column) appears to the qualified names of all columns false
true
manta.collibra.addRelationAction (as of Manta Data Lineage R42.2) Flag whether to add or replace relations during collibra export. The value should be then taken as is and used as a parameter relationsAction to all import job requests. Add
Replace
collibra.mappingSpecification.sourceCode (as of Manta Data Lineage R42.2) If true (default), the source code will be exported as a new attribute of the mapping specification. It will contain the exact value of the source code of the node represented by the mapping specification. Source codes of the contracted transformations are not collected Configuration option for the attribute type in the Collibra Entity Types configuration: 00000000-0000-0000-0000-000000000249 false
true
collibra.uploadMetadata.proxy.use Set to true if the Collibra export API should be called through a proxy server; otherwise, set to false. false
true
collibra.uploadMetadata.proxy.protocol Collibra export proxy protocol (e.g., HTTP, HTTPS); only required and used if collibra.uploadMetadata.proxy.use is set to true. http
https
collibra.uploadMetadata.proxy.serverName Collibra export proxy host name; only required and used if collibra.uploadMetadata.proxy.use is set to true proxy.corp.com
collibra.uploadMetadata.proxy.portNumber Collibra export proxy port; only required and used if collibra.uploadMetadata.proxy.use is set to true. 8080
collibra.uploadMetadata.proxy.userName Collibra export proxy user name; only used when collibra.uploadMetadata.proxy.use is set to true.
collibra.uploadMetadata.proxy.password Collibra export proxy password; only used when collibra.uploadMetadata.proxy.use is set to true.
manta.collibra.mantaLineageLink.baseUrl Base URL of links to Manta Data Lineage lineage in the format http(s)://<host>:<port>/<application context>. Leave empty if the Manta Server URL (the property manta.repository.url) should be used. Only applicable when manta.collibra.exportMantaDirectLineageLink or manta.collibra.exportMantaIndirectLineageLink is set to true. https://manta.mycompany.com:8443/manta-dataflow-server
manta.collibra.exportMantaDirectLineageLink Flag whether or not to export a Manta Data Lineage direct lineage link as a Collibra node attribute. true
false
manta.collibra.directLineageFilter Only applicable when manta.collibra.exportMantaDirectLineageLink=true. Filters applied to Manta Data Lineage direct lineage referenced by the link; filter names from Manta UI should be used. Everything
DB, files, and reports
Database objects
Important objects
manta.collibra.directLineageInitialDepth Only applicable when manta.collibra.exportMantaDirectLineageLink=true. Initial depth of Manta direct lineage referenced by the link. 3
manta.collibra.directLineageDirection Only applicable when manta.collibra.exportMantaDirectLineageLink=true. Direction of Manta direct lineage referenced by the link. Both
Forward
Backward
manta.collibra.exportMantaIndirectLineageLink Flag whether or not to export a Manta indirect lineage link as a Collibra node attribute. true
false
manta.collibra.indirectLineageFilter Only applicable when manta.collibra.exportMantaIndirectLineageLink=true. Filters applied to Manta indirect lineage referenced by the link; filter names from Manta UI should be used. Everything
DB, files, and reports
Database objects
Important objects
manta.collibra.indirectLineageInitialDepth Only applicable when manta.collibra.exportMantaIndirectLineageLink=true. Initial depth of Manta indirect lineage referenced by the link. 3
manta.collibra.indirectLineageDirection Only applicable when manta.collibra.exportMantaIndirectLineageLink=true. Direction of Manta indirect lineage referenced by the link. Both
Forward
Backward

Keystore Certificate for an HTTPS Connection to Collibra DGC

Collibra DGC uses a secure HTTPS connection by default. In order to establish the secure connection from Manta Data Lineage to DGC, a signed certificate from DGC needs to be added to the lis of certificates that are trusted by Manta Data Lineage.

The following steps are written for Chrome, but any browser can be used.

  1. Log in to the DGC instance using a web browser.
  2. Click the padlock to the left of the address bar, and then select Certificate from the menu that appears.
  3. Click the Detail tab and then the Copy to File... button.
  4. Run the wizard using the default settings to export the certificate.
  5. Open a new command window.
  6. Run the following commands. Replacr <path_to_certificate> with the path to the certificate that you extracted earlier.

    Windows

     cd <MANTA_DIR_HOME>/scenarios/manta-dataflow-cli/conf
     "%JRE_HOME%\bin\keytool.exe" -import -trustcacerts -keystore mantaConnectorsTruststore.pkcs12 -storepass mantaConnectorsTruststore -noprompt -alias Collibra -file <path_to_certificate>
    
**Unix**


``` bash
cd <MANTA_DIR_HOME>/scenarios/manta-dataflow-cli/conf
$JRE_HOME/bin/keytool -import -trustcacerts -keystore mantaConnectorsTruststore.pkcs12 -storepass mantaConnectorsTruststore -noprompt -alias Collibra -file <path_to_certificate>
```

Notes on Integration API

Integration API can be used for export instead of Synchronization API. It's faster, but it's missing the synchronization part, which means that obsolete assets previously loaded by Manta Data Lineage aren't automatically deleted anymore. This might, however, be desired behavior as the items that are not available anymore can still be recorded and governed. With each export, Manta Data Lineage updates the attributes Manta Exported On and Manta Revision on each exported asset. These can be used to identify whether the asset still exists in the latest Manta Data Lineage revision and build an additional workflow in Collibra (Getting started with workflows) to, for example, change the status of assets that no longer exist to a Deprecated or To be reviewed state for governance needs.

Since Integration API only uploads assets and does not delete anything, it can be used to upload metadata from Manta Data Lineage to Collibra in smaller segments (which is hard to achieve when the Synchronization API is used). It is the user's responsibility to make sure that the desired export filters are configured and used before running the export to Collibra.

Collibra DGC Entity Types

All domain types, asset types, characteristics, relation types, and complex relation types can be customized in Collibra DGC. This information is therefore configurable and needs to be reflected and aligned in Manta Data Lineage as well. This configuration is available on the Manta Admin UI screen under Configurations > Integrations > Collibra > Collibra Entity Types.

Property name Description Example
Entity name Manta's unique name for the entity type. collibra.table
Entity ID The resource/relation ID of the entity type that will be used. 00000000-0000-0000-0000-000000031007

The configuration contains default values for the out-of-the-box Collibra DGC operating model.

See the following list of custom entity types. These entities (asset types and relation types) have to be created manually in Collibra DGC.

Assets:

Coroles

When you're creating new relations, Collibra requires the Corole attribute. Manta Data Lineage does not operate with this attribute, so there is no need for a specific configuration. The recommended Coroles are provided only for better readability.

Relations

Attributes

For more informationa about the Description of the default attribute mapping in scanners, see Collibra Mapping in Power BI

Collibra DGC Scopes

A scope in Collibra Data Governance Center is a selection of communities and domains that you refer to when assigning an asset type. An assignment for a specific scope is also referred to as a "scoped assignment" as opposed to a global assignment. A scoped assignment only applies to assets if the assets are located in a domain or community that belongs to the scope.

A Manta Data Lineage scope should be created in your Collibra DGC instance. Make sure that you meet the following conditions:

Export Filters and Mapping

By default, Manta Data Lineage exports all assets, relations, and complex relations to the Manta Flow default community (which has to exist before the export) and default domains (for systems, databases, physical assets, mappings, reports, and logical model assets). Users should change this configuration and decide which communities and domains particular pieces of metadata will be loaded into. The configuration is available on the Manta Admin UI screen under Configurations > Integrations > Collibra > Collibra Export Filter.

Column name Description Example
Manta qualified name Identification of the root node for all entities in Manta Flow that will be exported into the community with the given Collibra community name. The name can consist of zero or more segments, which in the case of most database systems can include technology, server, database, schema, table, and/or column. Each of these parts can be a regular expression. If any part should contain a slash, it needs to be escaped with a backslash. /
/MSSQL/WINRR0001/dwh_db
Export mode The mode that determines how the export will be performed. Possible values are:
Full export—All the physical, logical, technology, reporting, and transformation assets are exported together with all their relations.
Lineage export—With the exception of mapping specifications, no assets are created. Already existing assets are used instead. The typical use case for this option would be the use of a physical data catalog generated by Collibra Catalog. When exporting lineage, the meaning of the other columns differs slightly—data lineage is loaded into the specified mappings domain and only the existing assets are used from the other specified domains.
Contraction—Neither data assets nor detailed lineage is exported for objects identified by the given qualified name. Any data flow going through these "contracted" objects is simplified to only one hop between the source and the target.
Exclusion—Neither data assets nor data lineage are exported for objects identified by the given qualified name. No data lineage going through these excluded objects is exported at all.
Full export
Lineage export
Contraction
Exclusion
Community name The name of the target Collibra DGC Community that Manta Data Lineage will load the specified metadata into. This value can be overridden for specific domains. Controlling
Technology domain name
Retail Community > "Systems and Databases" of technology asset domain

Default Collibra term: technology asset domain

Manta terms:
collibra.technologyDomain / technology domain (not consistent)
Ticket to make them consistent
The name of the domain containing systems and databases. The default value is "Systems & Databases" as defined in collibraEntityTypes.properties. Systems & Databases
Technology domain type The type of the domain containing systems and databases. The default value is "Technology Asset Domain" as defined in collibraEntityTypes.properties. collibra.technologyDomain
Technology community name The name of the target Collibra DGC Community into which Manta Data Lineage will load the specified metadata from the technology domain. If empty, then the community name applies. Controlling Systems and Databases
Physical dictionary domain name The name of the domain containing physical data assets. The default value is "Physical Assets". Physical Assets
Physical dictionary domain type The type of the domain containing physical data assets. The default value is "Physical Data Dictionary" as defined in collibraEntityTypes.properties. collibra.physicalDictionaryDomain
Physical dictionary community name The name of the target Collibra DGC Community into which Manta Data Lineage will load the specified metadata from the physical dictionary domain. If empty, then the community name applies. Controlling Physical Assets
Mappings domain name The name of the domain containing mappings. The default value is "Mappings". Mappings
Mappings domain type The type of the domain containing mappings. The default value is "Mapping Domain" as defined in collibraEntityTypes.properties. collibra.mappingsDomain
Mappings community name The name of the target Collibra DGC Community into which Manta Data Lineage will load the specified metadata from the mappings domain. If empty, then the community name applies. Controlling Mappings
Reports domain name The name of the domain containing reports. The default value is "Reports". Reports
Reports domain type The type of the domain containing reports. The default value is "Report Catalog" as defined in collibraEntityTypes.properties. collibra.reportsDomain
Reports community name The name of the target Collibra DGC Community into which Manta Data Lineage will load the specified metadata from the reports domain. If empty, then the community name applies. Controlling Reports
Logical dictionary domain name The name of the domain containing logical model assets. The default value is "Logical Model". Logical Model
Logical dictionary domain type The type of the domain containing logical model assets. The default value is "Logical Data Dictionary" as defined in collibraEntityTypes.properties. collibra.logicalDictionaryDomain
Logical dictionary community name The name of the target Collibra DGC Community into which Manta Data Lineage will load the specified metadata from the logical dictionary domain. If empty, then the community name applies. Controlling Logical Model

Example configuration

For each domain, it is possible to override the default community defined in Community Name. In other words, for example, if a specific community is is defined for the Technology domain in the Technology Community Name field, all assets matching the row will be uploaded into this community. Otherwise, they will be uploaded into the default community.

Manta qualified name

Export mode

Community name

Technology domain name

Technology domain type

Technology community name

Physical dictionary domain name

Physical dictionary domain type

Physical dictionary community name

Mappings domain name

Mappings domain type

Mappings community name

Reports domain name

Reports domain type

Reports community name

Logical dictionary domain name

Logical dictionary domain type

Logical dictionary community name

/

Full export

Manta default community

Systems & databases

collibra.technologyDomain


Physical assets

collibra.physicalDictionaryDomain


Mappings

collibra.mappingsDomain


Reports

collibra.reportsDomain


Logical model

collibra.logicalDictionaryDomain


/MSSQL/ srv1/stage.*

Contraction

















/MSSQL/ srv1/test.*

Exclusion

















/MSSQL/ srvr1/ db/Finance

Full export

Controlling

Systems & databases

collibra.technologyDomain

Controlling Systems and Databases

Loan MGMT

collibra.physicalDictionaryDomain


Loan MGMT mappings

collibra.mappingsDomain








/Teradata/ DWH

Lineage export

Controlling

Systems & databases

collibra.technologyDomain

Controlling Systems and Databases

DWH

collibra.physicalDictionaryDomain

Controlling Physical assets

DWH mappings

collibra.mappingsDomain

Controlling Mappings







/MSSQL/ srvr2/ db/Retail

Full export

Retail

Systems & databases

collibra.technologyDomain

Retail Systems and Databases

Source CRM retail

collibra.physicalDictionaryDomain


Source CRM retail mappings

collibra.mappingsDomain








/IFPC/ Talend/ SSIS

Full export

Retail

Systems & databases

collibra.technologyDomain





Load to DWH mappings

collibra.mappingsDomain








/SSAS/ SSRS/ PowerBI

Full export

Retail

Systems & databases

collibra.technologyDomain








PowerBI reports

Report catalog





/Logical Model

Full export

Retail

Systems & databases

Technology asset domain





Logical model mappings

Mapping domain





Logical model

Logical data dictionary

Retail logical model

The regular expression .* stands for just one entry. For example, path/Snowflake/server/database/schema/table/<column_to_remove> cannot be replaced by one expression, but rather a separate one should be used for each level such as: /Snowflake/.*/.*/.*/.*/<column_to_remove>. To exclude anything from nodes on multiple levels, multiple exclusions must be configured separately. If more than one row of the configuration applies to a particular piece of metadata in Manta Data Lineage, it is supposed that the Manta Data Lineage qualified names have different numbers of segments. Identifying the same objects with qualified names with the same number of segments more than once is invalid. When more rows apply and the configuration is valid, the following rules decide which of the rows will be used.

If the record does not match any of the rows, the exclusion mode is used; that is, records not matching any of the rows will not be exported.

Collibra Mapping Configuration

This configuration defines the mapping of the Manta Data Lineage database hierarchy to the Collibra hierarchy. It is available on the Manta Admin UI screen under Configurations > Integrations > Collibra > Collibra Mapping.

Each row of the file represents a mapping that translates the database host and database name stored in the Manta Data Lineage repository to the host and database names used in Collibra.

Column name
Description
Example
Host in Manta Server to match in the Manta repository dwh-prod
Database in Manta Database to match within the server (leave empty to match all databases in the given server) stage
Collibra host Collibra host to map this server/database to DWH
Collibra database Collibra database to map this database to (leave empty to keep the original database name; must be empty if the Database in Manta column is empty) Stage layer

Filesystem Mapping Configuration

This configuration defines the mapping of the Manta Filesystem hierarchy to the Collibra hierarchy. It is available on the Manta Admin UI screen under Configurations > Integrations > Collibra > Filesystem Mapping.

Each row of the file maps a top-level item in a given Manta Data Lineage resource to a host and path in Collibra.

Column name
Description
Example
Resource name Resource in the Manta repository this mapping applies to Filesystem
Top-level directory / file Top-level item (server, bucket, directory, or file) within the resource that this mapping applies to dataserver
Host Collibra host to map this item and all its descendants to Data Storage
Path within host Path in the Collibra host to put the top-level item in (the directory separator is "/"; leave empty to put the top-level item in the Collibra host as a top-level item) stage/tempfiles

For example, with the configuration as given in the example values, a file in the resource Filesystem on the path dataserver/client/emails.txt would be mapped to the Collibra host Data Storage and the path stage/tempfiles/dataserver/client/emails.txt.