Manta Flow Collibra Client Configuration
There are several essential configuration units the user has to set up (or retain the defaults in some cases) so Manta Data Lineage can export data lineage to Collibra DGC. These configuration units are described in the following sections. All these configuration units can be updated using the Manta Configurator web application.
Collibra DGC Connection
A connection to Collibra DGC must be set up. This configuration is available on the Manta Admin UI screen under Configurations >Integrations > Collibra > Collibra Export. The configuration contains the following properties.
Property name | Description | Example |
---|---|---|
collibra.exportMode | Configure the REST for the direct Manta-to-Collibra-DGC integration. | REST |
collibra.uploadMetadata.protocol | A protocol used for the connection to Collibra DGC API; the options are HTTP and HTTPS. | https |
collibra.uploadMetadata.serverName | Host including the domain name or IP address of Collibra DGC. | example.collibra.com |
collibra.uploadMetadata.portNumber | Port number on which Collibra DGC API is available. | If there's no port number in the URL, use the default port numbers (80 for HTTP, 443 for HTTPS). |
collibra.uploadMetadata.userName | Collibra DGC username with privileges for the REST API and the communities that the metadata upload will target. | admin |
collibra.uploadMetadata.password | Collibra DGC password. | password |
collibra.uploadMetadata.path | Collibra DGC API v2 location; not to be changed. | rest/2.0 |
collibra.uploadMetadata.segmentSeparator | Character used to separate segments of qualified asset names; space characters need to be escaped by preceding them with a backslash \ ; if the backslash \ itself is part of the segment name, it should be escaped with
three additional backslashes resulting in a \\\\ sequence |
> |
collibra.uploadMetadata.assetStatus | Status to be set for each asset exported to Collibra; the value is not validated against Collibra DGC settings. | Accepted |
collibra.uploadMetadata.importApi.batchSize | Batch size of requests to the import API; the batch size is the number of objects (domains, assets, complex relations) that will be loaded into Collibra DGC in one batch; the recommended value range is between 5,000 and 10,000. | 10000 |
collibra.uploadMetadata.importApi.singleSynchId | Set to true if a single synchronization ID should be used for all communities (requires Collibra DGC 5.6.2 or newer); set to false if a specific synchronization ID should be used for each community. Setting this value to false may have a negative
impact on performance as the batches of metadata sent over to Collibra are split per community and therefore may not be utilizing the full batch size configured in collibra.uploadMetadata.importApi.batchSize . |
false true |
collibra.uploadMetadata.importApi.exportExternalMappings | Set to true if external mappings have to be exported; otherwise, set to false. | false true |
manta.collibra.useTrustStore | Set to true to use a custom truststore with the certificate used by Collibra DGC for HTTPS connections; set to false to use Java's default truststore or when an HTTP connection is used. | true false |
manta.collibra.exportTransformationLogic | Flag whether or not to export the transformation logic (calculation formulas) for an individual field into complex relations (field mappings). | false true |
manta.collibra.exportRevisionNumber | Flag whether or not to export the revision number of the nodes in Manta Data Lineage . | false true |
manta.collibra.exportExportDate | Flag whether or not to export the date and time the export started | false true |
manta.collibra.useIntegrationApi (valid up to an including Manta Data Lineage R42.1) | Flag whether to use Synchronization API or Integration API. If set to false (default), Synchronization API is used. Synchronization API takes care of automated deletion of assets and flows that are no longer exported or available as part of the exported revision. There are, however, performance constraints on the Collibra side that may make this export mode unusable. Integration API only imports new items. It does not handle the deletion of objects and flows that are no longer exported or available as part of the exported revision. This export mode is much faster compared to Synchronization API. Note the section entitled Notes on Using Integration API. | false true |
manta.collibra.useIntegrationApi (updated as of Manta R42.2) | Synchronization mode for assets that no longer exist in the external system. If Change Missing Asset Status is selected, indicate new status UUID which you can find on Collibra operation-model page. Default value (Obsolete) UUID: 00000000-0000-0000-0000-000000005011. Synchronization API is used. Synchronization API takes care of updating the status of assets and flows that are no longer exported or available as part of the exported revision. If Delete is selected, Synchronization API is used. Synchronization API takes care of automated deletion of assets and flows that are no longer exported or available as part of the exported revision. There are, however, performance constraints on the Collibra side that may make this export mode unusable. If No Synchronization is selected, Integration API is used. Integration API only imports new items. It does not handle the deletion of objects and flows that are no longer exported or available as part of the exported revision. This export mode is much faster compared to Synchronization API. See the section below entitled Notes on Using Integration API. | Change Missing Asset Status Delete No Synchronization |
manta.collibra.continueOnError (as of Manta Data Lineage R42.1) | Flag whether to continue on errors. If set to true (default), errors that occurred during export to the Collibra will be ignored and the export result will be successful. The result of unsuccessful operations/mapping will be visible in the Collibra and also written out in the log file. | false true |
manta.collibra.integrateWithCollibraEdge (as of Manta Data Lineage R42.2) | If true, the Segment separator configuration option will be hidden, and the value will be set to > . If true, the Export external mappings configuration option will be hidden, and the value will be set to false .
If true , the suffix (column) appears to the qualified names of all columns |
false true |
manta.collibra.addRelationAction (as of Manta Data Lineage R42.2) | Flag whether to add or replace relations during collibra export. The value should be then taken as is and used as a parameter relationsAction to all import job requests. |
Add Replace |
collibra.mappingSpecification.sourceCode (as of Manta Data Lineage R42.2) | If true (default), the source code will be exported as a new attribute of the mapping specification. It will contain the exact value of the source code of the node represented by the mapping specification. Source codes of the contracted transformations are not collected Configuration option for the attribute type in the Collibra Entity Types configuration: 00000000-0000-0000-0000-000000000249 | false true |
collibra.uploadMetadata.proxy.use | Set to true if the Collibra export API should be called through a proxy server; otherwise, set to false. | false true |
collibra.uploadMetadata.proxy.protocol | Collibra export proxy protocol (e.g., HTTP, HTTPS); only required and used if collibra.uploadMetadata.proxy.use is set to true. |
http https |
collibra.uploadMetadata.proxy.serverName | Collibra export proxy host name; only required and used if collibra.uploadMetadata.proxy.use is set to true |
proxy.corp.com |
collibra.uploadMetadata.proxy.portNumber | Collibra export proxy port; only required and used if collibra.uploadMetadata.proxy.use is set to true. |
8080 |
collibra.uploadMetadata.proxy.userName | Collibra export proxy user name; only used when collibra.uploadMetadata.proxy.use is set to true. |
|
collibra.uploadMetadata.proxy.password | Collibra export proxy password; only used when collibra.uploadMetadata.proxy.use is set to true. |
|
manta.collibra.mantaLineageLink.baseUrl | Base URL of links to Manta Data Lineage lineage in the format http(s)://<host>:<port>/<application context> . Leave empty if the Manta Server URL (the property manta.repository.url) should be used. Only applicable
when manta.collibra.exportMantaDirectLineageLink or manta.collibra.exportMantaIndirectLineageLink is set to true. |
https://manta.mycompany.com:8443/manta-dataflow-server |
manta.collibra.exportMantaDirectLineageLink | Flag whether or not to export a Manta Data Lineage direct lineage link as a Collibra node attribute. | true false |
manta.collibra.directLineageFilter | Only applicable when manta.collibra.exportMantaDirectLineageLink=true . Filters applied to Manta Data Lineage direct lineage referenced by the link; filter names from Manta UI should be used. |
Everything DB, files, and reports Database objects Important objects |
manta.collibra.directLineageInitialDepth | Only applicable when manta.collibra.exportMantaDirectLineageLink=true . Initial depth of Manta direct lineage referenced by the link. |
3 |
manta.collibra.directLineageDirection | Only applicable when manta.collibra.exportMantaDirectLineageLink=true . Direction of Manta direct lineage referenced by the link. |
Both Forward Backward |
manta.collibra.exportMantaIndirectLineageLink | Flag whether or not to export a Manta indirect lineage link as a Collibra node attribute. | true false |
manta.collibra.indirectLineageFilter | Only applicable when manta.collibra.exportMantaIndirectLineageLink=true . Filters applied to Manta indirect lineage referenced by the link; filter names from Manta UI should be used. |
Everything DB, files, and reports Database objects Important objects |
manta.collibra.indirectLineageInitialDepth | Only applicable when manta.collibra.exportMantaIndirectLineageLink=true . Initial depth of Manta indirect lineage referenced by the link. |
3 |
manta.collibra.indirectLineageDirection | Only applicable when manta.collibra.exportMantaIndirectLineageLink=true . Direction of Manta indirect lineage referenced by the link. |
Both Forward Backward |
Keystore Certificate for an HTTPS Connection to Collibra DGC
Collibra DGC uses a secure HTTPS connection by default. In order to establish the secure connection from Manta Data Lineage to DGC, a signed certificate from DGC needs to be added to the lis of certificates that are trusted by Manta Data Lineage.
The following steps are written for Chrome, but any browser can be used.
- Log in to the DGC instance using a web browser.
- Click the padlock to the left of the address bar, and then select Certificate from the menu that appears.
- Click the Detail tab and then the Copy to File... button.
- Run the wizard using the default settings to export the certificate.
- Open a new command window.
-
Run the following commands. Replacr
<path_to_certificate>
with the path to the certificate that you extracted earlier.Windows
cd <MANTA_DIR_HOME>/scenarios/manta-dataflow-cli/conf "%JRE_HOME%\bin\keytool.exe" -import -trustcacerts -keystore mantaConnectorsTruststore.pkcs12 -storepass mantaConnectorsTruststore -noprompt -alias Collibra -file <path_to_certificate>
**Unix**
``` bash
cd <MANTA_DIR_HOME>/scenarios/manta-dataflow-cli/conf
$JRE_HOME/bin/keytool -import -trustcacerts -keystore mantaConnectorsTruststore.pkcs12 -storepass mantaConnectorsTruststore -noprompt -alias Collibra -file <path_to_certificate>
```
Notes on Integration API
Integration API can be used for export instead of Synchronization API. It's faster, but it's missing the synchronization part, which means that obsolete assets previously loaded by Manta Data Lineage aren't automatically deleted anymore.
This might, however, be desired behavior as the items that are not available anymore can still be recorded and governed. With each export, Manta Data Lineage updates the attributes Manta Exported On and Manta Revision on each exported asset. These can be used to identify whether the asset still exists in the latest Manta Data Lineage revision and build an additional workflow in Collibra (Getting started with workflows)
to, for example, change the status of assets that no longer exist to a Deprecated
or To be reviewed
state for governance needs.
Since Integration API only uploads assets and does not delete anything, it can be used to upload metadata from Manta Data Lineage to Collibra in smaller segments (which is hard to achieve when the Synchronization API is used). It is the user's responsibility to make sure that the desired export filters are configured and used before running the export to Collibra.
Collibra DGC Entity Types
All domain types, asset types, characteristics, relation types, and complex relation types can be customized in Collibra DGC. This information is therefore configurable and needs to be reflected and aligned in Manta Data Lineage as well. This configuration is available on the Manta Admin UI screen under Configurations > Integrations > Collibra > Collibra Entity Types.
Property name | Description | Example |
---|---|---|
Entity name | Manta's unique name for the entity type. | collibra.table |
Entity ID | The resource/relation ID of the entity type that will be used. | 00000000-0000-0000-0000-000000031007 |
The configuration contains default values for the out-of-the-box Collibra DGC operating model.
See the following list of custom entity types. These entities (asset types and relation types) have to be created manually in Collibra DGC.
Assets:
- Cube (expected parent type is Data Structure).
- Analytical model (expected parent type is Data Structure).
- Dimension (expected parent type is Data Structure).
- Analytical table (expected parent type is Data Structure).
- Analytical attribute (expected parent type is Data Element).
Coroles
When you're creating new relations, Collibra requires the Corole attribute. Manta Data Lineage does not operate with this attribute, so there is no need for a specific configuration. The recommended Coroles are provided only for better readability.
Relations
- Report implemented in a Technology Asset (recommended Corole: implements).
- File source of a Mapping Specification (recommended Corole: source).
- File target of a Mapping Specification (recommended Corole: target).
- Data attribute model for a Data Element (recommended Corole: models).
- Data entity uses a Data Entity (recommended Corole: used in).
- Dimension is part of a Cube (recommended Corole: contains).
- Analytical table is part of an Analytical Model (recommended Corole: contains).
- Analytical attribute is part of a Dimension (recommended Corole: contains).
- Analytical attribute is part of an Analytical Table (recommended Corole: contains).
- Data element source of a Data Element (recommended Corole: target of).
Attributes
- Type (attribute type Text) (add to data element asset).
- Manta Link (attribute type Text) (add to column, table, and mapping specification assets).
- Manta Exported On (attribute type Text) (add to Business Asset, Data Asset, Technology Asset, and then any other child node that should have these attributes visible).
- Manta Revision (attribute type Number) (add to Business Asset, Data Asset, Technology Asset, and then any other child node that should have these attributes visible).
- Author (attribute type Text).
- Expression (attribute type Text).
- Generic metadata 1-4 (attribute type Text), generic attributes are used to export various attributes provided by the respective Manta scanners.
- Label (attribute type Text).
- Path (attribute type Text).
For more informationa about the Description of the default attribute mapping in scanners, see Collibra Mapping in Power BI
Collibra DGC Scopes
A scope in Collibra Data Governance Center is a selection of communities and domains that you refer to when assigning an asset type. An assignment for a specific scope is also referred to as a "scoped assignment" as opposed to a global assignment. A scoped assignment only applies to assets if the assets are located in a domain or community that belongs to the scope.
A Manta Data Lineage scope should be created in your Collibra DGC instance. Make sure that you meet the following conditions:
- All communities used are assigned to a Manta Data Lineage scope.
- All custom relation types are assigned to corresponding assets in a Manta Data Lineage scope.
- All asset attributes are assigned to corresponding assets in a Manta Data Lineage scope.
- Mapping assignments are generated for Manta Data Lineage scope for all asset types.
Export Filters and Mapping
By default, Manta Data Lineage exports all assets, relations, and complex relations to the Manta Flow default community (which has to exist before the export) and default domains (for systems, databases, physical assets, mappings, reports, and logical model assets). Users should change this configuration and decide which communities and domains particular pieces of metadata will be loaded into. The configuration is available on the Manta Admin UI screen under Configurations > Integrations > Collibra > Collibra Export Filter.
Column name | Description | Example |
---|---|---|
Manta qualified name | Identification of the root node for all entities in Manta Flow that will be exported into the community with the given Collibra community name. The name can consist of zero or more segments, which in the case of most database systems can include technology, server, database, schema, table, and/or column. Each of these parts can be a regular expression. If any part should contain a slash, it needs to be escaped with a backslash. | / /MSSQL/WINRR0001/dwh_db |
Export mode | The mode that determines how the export will be performed. Possible values are: Full export—All the physical, logical, technology, reporting, and transformation assets are exported together with all their relations. Lineage export—With the exception of mapping specifications, no assets are created. Already existing assets are used instead. The typical use case for this option would be the use of a physical data catalog generated by Collibra Catalog. When exporting lineage, the meaning of the other columns differs slightly—data lineage is loaded into the specified mappings domain and only the existing assets are used from the other specified domains. Contraction—Neither data assets nor detailed lineage is exported for objects identified by the given qualified name. Any data flow going through these "contracted" objects is simplified to only one hop between the source and the target. Exclusion—Neither data assets nor data lineage are exported for objects identified by the given qualified name. No data lineage going through these excluded objects is exported at all. |
Full export Lineage export Contraction Exclusion |
Community name | The name of the target Collibra DGC Community that Manta Data Lineage will load the specified metadata into. This value can be overridden for specific domains. | Controlling |
Technology domain name Retail Community > "Systems and Databases" of technology asset domain Default Collibra term: technology asset domain Manta terms: collibra.technologyDomain / technology domain (not consistent) Ticket to make them consistent |
The name of the domain containing systems and databases. The default value is "Systems & Databases" as defined in collibraEntityTypes.properties . |
Systems & Databases |
Technology domain type | The type of the domain containing systems and databases. The default value is "Technology Asset Domain" as defined in collibraEntityTypes.properties . |
collibra.technologyDomain |
Technology community name | The name of the target Collibra DGC Community into which Manta Data Lineage will load the specified metadata from the technology domain. If empty, then the community name applies. | Controlling Systems and Databases |
Physical dictionary domain name | The name of the domain containing physical data assets. The default value is "Physical Assets". | Physical Assets |
Physical dictionary domain type | The type of the domain containing physical data assets. The default value is "Physical Data Dictionary" as defined in collibraEntityTypes.properties . |
collibra.physicalDictionaryDomain |
Physical dictionary community name | The name of the target Collibra DGC Community into which Manta Data Lineage will load the specified metadata from the physical dictionary domain. If empty, then the community name applies. | Controlling Physical Assets |
Mappings domain name | The name of the domain containing mappings. The default value is "Mappings". | Mappings |
Mappings domain type | The type of the domain containing mappings. The default value is "Mapping Domain" as defined in collibraEntityTypes.properties . |
collibra.mappingsDomain |
Mappings community name | The name of the target Collibra DGC Community into which Manta Data Lineage will load the specified metadata from the mappings domain. If empty, then the community name applies. | Controlling Mappings |
Reports domain name | The name of the domain containing reports. The default value is "Reports". | Reports |
Reports domain type | The type of the domain containing reports. The default value is "Report Catalog" as defined in collibraEntityTypes.properties . |
collibra.reportsDomain |
Reports community name | The name of the target Collibra DGC Community into which Manta Data Lineage will load the specified metadata from the reports domain. If empty, then the community name applies. | Controlling Reports |
Logical dictionary domain name | The name of the domain containing logical model assets. The default value is "Logical Model". | Logical Model |
Logical dictionary domain type | The type of the domain containing logical model assets. The default value is "Logical Data Dictionary" as defined in collibraEntityTypes.properties . |
collibra.logicalDictionaryDomain |
Logical dictionary community name | The name of the target Collibra DGC Community into which Manta Data Lineage will load the specified metadata from the logical dictionary domain. If empty, then the community name applies. | Controlling Logical Model |
Example configuration
For each domain, it is possible to override the default community defined in Community Name. In other words, for example, if a specific community is is defined for the Technology domain in the Technology Community Name field, all assets matching the row will be uploaded into this community. Otherwise, they will be uploaded into the default community.
Manta qualified name |
Export mode |
Community name |
Technology domain name |
Technology domain type |
Technology community name |
Physical dictionary domain name |
Physical dictionary domain type |
Physical dictionary community name |
Mappings domain name |
Mappings domain type |
Mappings community name |
Reports domain name |
Reports domain type |
Reports community name |
Logical dictionary domain name |
Logical dictionary domain type |
Logical dictionary community name |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
/ |
Full export |
Manta default community |
Systems & databases |
collibra.technologyDomain |
Physical assets |
collibra.physicalDictionaryDomain |
Mappings |
collibra.mappingsDomain |
Reports |
collibra.reportsDomain |
Logical model |
collibra.logicalDictionaryDomain |
|||||
|
Contraction |
|
|
|
|
|
|
|
|
|
|
|
|||||
|
Exclusion |
|
|
|
|
|
|
|
|
|
|
|
|||||
|
Full export |
Controlling |
Systems & databases |
collibra.technologyDomain |
Controlling Systems and Databases |
Loan MGMT |
collibra.physicalDictionaryDomain |
Loan MGMT mappings |
collibra.mappingsDomain |
|
|
|
|
||||
|
Lineage export |
Controlling |
Systems & databases |
collibra.technologyDomain |
Controlling Systems and Databases |
DWH |
collibra.physicalDictionaryDomain |
Controlling Physical assets |
DWH mappings |
collibra.mappingsDomain |
Controlling Mappings |
|
|
|
|
||
|
Full export |
Retail |
Systems & databases |
collibra.technologyDomain |
Retail Systems and Databases |
Source CRM retail |
collibra.physicalDictionaryDomain |
Source CRM retail mappings |
collibra.mappingsDomain |
|
|
|
|
||||
|
Full export |
Retail |
Systems & databases |
collibra.technologyDomain |
|
|
Load to DWH mappings |
collibra.mappingsDomain |
|
|
|
|
|||||
|
Full export |
Retail |
Systems & databases |
collibra.technologyDomain |
|
|
|
|
PowerBI reports |
Report catalog |
|
|
|||||
|
Full export |
Retail |
Systems & databases |
Technology asset domain |
|
|
Logical model mappings |
Mapping domain |
|
|
Logical model |
Logical data dictionary |
Retail logical model |
The regular expression .*
stands for just one entry. For example, path/Snowflake/server/database/schema/table/<column_to_remove>
cannot be replaced by one expression, but rather a separate one should be used for each
level such as: /Snowflake/.*/.*/.*/.*/<column_to_remove>
. To exclude anything
from nodes on multiple levels, multiple exclusions must be configured separately. If more than one row of the configuration applies to
a particular piece of metadata in Manta Data Lineage, it is supposed that the Manta Data Lineage qualified names have different numbers of segments. Identifying the same objects with qualified names with the same number of segments more than once
is invalid. When more rows apply and the configuration is valid, the following rules decide which of the rows will be used.
- If the export modes are different for the rows, then the row with the most restrictive export mode applies. The priority is Exclusion > Contraction > Lineage Export > Full Export.
- Between two or more rows with the same export mode, the row with the most specific Manta Data Lineage qualified name has the highest priority. The more segments the qualified name has, the more specific it is.
If the record does not match any of the rows, the exclusion mode is used; that is, records not matching any of the rows will not be exported.
Collibra Mapping Configuration
This configuration defines the mapping of the Manta Data Lineage database hierarchy to the Collibra hierarchy. It is available on the Manta Admin UI screen under Configurations > Integrations > Collibra > Collibra Mapping.
Each row of the file represents a mapping that translates the database host and database name stored in the Manta Data Lineage repository to the host and database names used in Collibra.
Column name
|
Description
|
Example
|
---|---|---|
Host in Manta | Server to match in the Manta repository | dwh-prod |
Database in Manta | Database to match within the server (leave empty to match all databases in the given server) | stage |
Collibra host | Collibra host to map this server/database to | DWH |
Collibra database | Collibra database to map this database to (leave empty to keep the original database name; must be empty if the Database in Manta column is empty) | Stage layer |
Filesystem Mapping Configuration
This configuration defines the mapping of the Manta Filesystem hierarchy to the Collibra hierarchy. It is available on the Manta Admin UI screen under Configurations > Integrations > Collibra > Filesystem Mapping.
Each row of the file maps a top-level item in a given Manta Data Lineage resource to a host and path in Collibra.
Column name
|
Description
|
Example
|
---|---|---|
Resource name | Resource in the Manta repository this mapping applies to | Filesystem |
Top-level directory / file | Top-level item (server, bucket, directory, or file) within the resource that this mapping applies to | dataserver |
Host | Collibra host to map this item and all its descendants to | Data Storage |
Path within host | Path in the Collibra host to put the top-level item in (the directory separator is "/"; leave empty to put the top-level item in the Collibra host as a top-level item) | stage/tempfiles |
For example, with the configuration as given in the example values, a file in the resource Filesystem
on the path dataserver/client/emails.txt
would be mapped to the Collibra host Data Storage
and the path
stage/tempfiles/dataserver/client/emails.txt
.