Unity Catalog
The Unity Catalog origin reads data from a Databricks Unity Catalog managed table. Use the origin only in Databricks pipelines.
The origin can perform a full or incremental read. The origin performs a full read of all available data, by default.
When you configure the origin, you specify the catalog, schema, and table to read from. If you configure the origin to perform an incremental read, you specify the offset column and initial offset to use.
Full or Incremental Read
The Unity Catalog origin can perform a full read or an incremental read each time you run the pipeline. By default, the origin performs a full read of the specified table.
When the origin performs a full read, the origin processes all data available in the table each time that the pipeline runs.
When the origin performs an incremental read, the first pipeline run is the same as a full read. When the pipeline stops, the origin stores the offset where it stopped processing. For subsequent pipeline runs, the origin reads the table starting from the last-saved offset, unless you reset the pipeline offsets.
When you configure the origin to perform an incremental read, you specify the offset column and initial offset to use. As a best practice, an offset column should be an incremental and unique column that does not contain null values. Having an index on this column is strongly encouraged since the underlying query uses an ORDER BY clause and inequality operators on this column.
Configuring a Unity Catalog Origin
Configure a Unity Catalog origin to read from a Databricks Unity Catalog managed table. Use the origin only in Databricks pipelines.