Databricks
Databricks is a cloud-based data management platform that offers a variety of features that make it an attractive option for data management. These features include a user-friendly interface, scalability, and support for a variety of data formats. Databricks also offers a number of other features that make it a powerful tool for data management, including support for multiple data sources, a variety of data visualization tools, and a wide range of integrations.
IBM Automatic Data Lineage Databricks scanner connects to the Databricks API to download data for tables and notebooks, which are then analyzed in cooperation with other scanners (Databricks SQL, etc.). If executing the embedded scanner fails for whatever reason, the lineage information provided by the Unity Catalog API is used instead. The result is displayed in the context of the entire Databricks environment.
Extraction and Analysis Phase Scenarios
Extraction Phase
For the extraction phase for Databricks, there is only one scenario.
- Databricks ingestion scenario - pulls inputs from git Manta Flow Agent Configuration for Extraction:Git Source or a remote agent filesystem location Manta Flow Agent Configuration for Extraction:Agent Source