Unity Catalog
The Unity Catalog destination writes data to a Databricks Unity Catalog table. Use the destination only in Databricks pipelines.
The destination can write data to a new or existing managed table or external table. For an external table, the destination can write to any external system supported by Databricks Unity Catalog.
When you configure the destination, you specify the table type. When writing to an external table, you specify the table location and file type. You can also specify additional file options to use.
You define the catalog, schema, and table name to write to as well as the write mode to use. With some write modes, you can configure the destination to update or overwrite the existing schema, and to use partition columns.
Table Creation
The Unity Catalog destination can create a managed or external Unity Catalog table, as needed. If you configure the destination to write to a table that does not exist, the destination creates a table of that name in the specified location.
If you use the Overwrite Data write mode and specify partitions, the destination includes partitions when creating the table.
Partitioning
- New table
- When the Unity Catalog destination writes
to a new table and partition columns are not defined in stage
properties, the destination uses the same number of partitions that
Spark uses to process the upstream pipeline stages. The destination
randomly redistributes the data to balance the data across the
partitions, and then writes one output file for each partition to
the specified table path. For example, if Spark splits the pipeline
data into 20 partitions, the destination writes 20 output files to
the specified table path.
When the destination writes to a new table and partition columns are defined in stage properties, the destination redistributes the data by the specified column, placing records with the same value for the specified column in the same partition. The destination creates a single file for each partition, writing each file to a subfolder within the table path.
- Existing table
- When the Unity Catalog destination writes to an existing table and partition columns are not defined in stage properties, the destination automatically uses the same partitioning as the existing table.
Write Mode
- Overwrite data
- The destination drops and recreates the table with each batch of data, using any specified partition columns. To avoid overwriting data unintentionally, use this write mode only with batch execution mode pipelines.
- Append data
- Appends data to existing data in the table.
- Error if exists
- Generates an error that stops the pipeline if the table exists.
- Ignore
- Ignores data in the pipeline if the table exists, writing no data to the table.
Configuring a Unity Catalog Destination
Configure a Unity Catalog destination to write to a Databricks Unity Catalog table. Use the destination only in Databricks pipelines.