Manta Flow Alation: Integration Architecture

Components

The following diagram shows which components are used for the integration and what the relations between them are.

No alt text provided

Integration Process

The integration process is divided into three phases.

Metadata Extraction and Lineage Analysis

During the metadata extraction and lineage analysis phase, only the IBM Automatic Data Lineage components (and source systems) are utilized. At the beginning of this phase, Automatic Data Lineage extracts the metadata necessary for the lineage analysis of the source systems (databases as well as ETL, analytical, and reporting tools) and produces the lineage as a product of the metadata analysis. At the end of this phase, data lineage is available in Automatic Data Lineage.

Important: Automatic Data Lineage and Alation metadata extractions should be carefully orchestrated by the administrator. As Automatic Data Lineage stitches the lineage to the Alation database objects (see the other phases), it is important that the Automatic Data Lineage and Alation metadata extraction processes are run at nearly the same time, or at least within a window when no changes are made to the database between each extraction. This is to ensure that both Automatic Data Lineage and Alation extract the same database objects (schema, tables, columns, etc.) and that it will not be a problem to stitch Automatic Data Lineage the lineage to the database structures provided by Alation. If the processes run at entirely different points in time, the database may be in a different state (a different database structure) in those extractions, which could cause issues when uploading the lineage to Alation.

Export

During the export phase, Automatic Data Lineage exports the lineage from the Automatic Data Lineage Repository to (JSON) files, which can then be uploaded to Alation. Thus, the component that is utilized the most during this phase is Manta Flow Server. This is also the first phase in which Automatic Data Lineage interacts with Alation (at least if any analytical tools, reporting tools and/or custom databases are exported).

The reason for this is that the output files containing lineage for analytical/reporting tools have to contain the IDs of the Alation BI servers.

In the case of custom databases, the files have to contain the ID of the Alation Virtual Datasource servers (VDS).

Thus, at the beginning of the export of the analytical/reporting tool, Automatic Data Lineage connects to Alation and:

  1. Lists all the BI/VDS servers

  2. If an existing BI/VDS server should be used (based on the mappings documented in the Manta Flow Alation: Manta Configuration article), Automatic Data Lineage fetches the ID of the BI/VDS server

  3. If a new BI/VDS server should be created, Automatic Data Lineage creates the BI/VDS server and fetches its ID.

The connection to Alation is not needed for the rest of the export process. As a result of the process, JSON files containing the exported lineage and analytical/reporting assets are created in the Automatic Data Lineage output folder.

Upload

During this phase, all the exported files containing the data lineage and analytical/reporting assets are uploaded to Alation.

The upload is done in batches according to the configuration described in Manta Flow Alation: Manta Configuration.

Once the upload phase is finished, up-to-date lineage is available in Alation. Only update and insert operations are performed during the upload phase; none of the objects previously ingested by Automatic Data Lineage are deleted.

The following diagram illustrates all the interactions between Automatic Data Lineage and Alation in greater detail.

No alt text provided