Databricks Scanner Guide

Databricks is cloud-based platform that can handle data tasks such as data ingestion; generating dashboards and visualizations; and data discovery, annotation and exploration. IBM Automatic Data Lineage is a powerful data lineage platform that simplifies data management by supporting Databricks lineage through Unity Catalog. Databricks Unity Catalog is used to capture runtime lineage in Databricks. Features include lineage support for all languages and column-level lineage. Automatic Data Lineage then extends that lineage by connecting it to lineage outside the Databricks environment.

Follow these steps to configure a connection to Databricks.

Step 1: Create a Databricks Account

  1. Get started by creating a Databricks account and setting up a workspace.

  2. Set up Unity Catalog. The following articles provide instructions on how to enable your Databricks account to use Unity Catalog.

Step 2: Configure the Connection

Create a new connection in Admin UI http://localhost:8281/manta-admin-gui/app/index.html?#/platform/connections/ to enable automated extraction of Databricks by Automatic Data Lineage. The connection requirements are listed in Databricks Integration Requirements. Note that Databricks scanner uses Agent. Read Manta Flow Agent Configuration for Extraction for more details.

Properties That Must Be Configured