Databricks Integration Requirements

The following are the prerequisites necessary for IBM Automatic Data Lineage to connect to this third-party system, which you may choose to do at your sole discretion. Note that while these are usually sufficient to connect to this third-party system, we cannot guarantee the success of the connection or integration since we have no control, liability, or responsibility for third-party products or services, including their performance.

The Manta Databricks scanner uses the Databricks API to connect to the Databricks instance. The Automatic Data Lineage instance must have network access to the Databricks API (hosted by Databricks). To access the Databricks API, it is necessary to provide a personal access token (PAT). The token can be obtained through the Databricks UI and will be used to authenticate the Manta Databricks scanner to the scanned Databricks instance. To be able to extract all assets, the safest bet is for the user (that the PAT belongs to) to be a metastore admin. otherwise, the following privileges are needed for individual entity extraction.

Hive metastore assets are also accessed with the personal access token. Here, the SELECT privilege is needed for all the assets to be extracted (e.g., schemas, tables).

Important: To enable extraction of data from Hive Metastore, it is currently necessary to copy a JAR file with the Databricks JDBC driver ( https://www.databricks.com/spark/jdbc-drivers-download) to the <MANTA_AGENT_DIR_HOME>/manta-flow-agent-dir/lib-ext folder. Otherwise, the extraction from Hive Metastore won’t be performed. For more information, Go to IBM Support. As of Automatic Data Lineage R42, if the driver is not provided, the extraction will always produce an error reminding the user about the missing driver. If the driver was intentionally not provided — for example, if nothing from Hive Metastore should be extracted — then the hive_metastore catalog should be included in the excluded catalogs list in the connection configuration.

Requirements to extract Unity Catalog Lineage

Supported Extraction Features

Supported Data Flow Analysis Features

Supported SQL Features

Known Unsupported Features

Automatic Data Lineage does not support the following Databricks features. This list includes all of the features that IBM is aware are unsupported, but it might not be comprehensive.