Querying Databricks Iceberg tables using Presto engine
You can query remote Databricks Iceberg tables using the watsonx.data Presto engine by registering Databricks as a custom data source for zero-copy data federation.
- Presto connects to Databricks through the Iceberg REST Catalog API interface, not Unity Catalog API directly.
- Presto does not currently support vended-credentials for Databricks integration.
- You must configure explicit storage credentials (AWS S3 and Azure Data Lake Storage Gen2) to access the underlying data files.
- Presto supports Iceberg tables only; Delta Lake tables are not supported.
- For general information about Databricks Unity Catalog integration, see Integrating Databricks Unity Catalog in watsonx.data.
Before you begin
Complete the prerequisites outlined in Integrating Databricks Unity Catalog in watsonx.data, including:
- Databricks workspace with Unity Catalog enabled
- Iceberg tables created in Databricks Unity Catalog
- Personal Access Token with
unity-catalogAPI scope - Workspace URL:
https://<workspace-instance>.cloud.databricks.com - Unity Catalog permissions configured
- Provisioned Presto engine in watsonx.data
- Iceberg REST Catalog endpoint:
https://<workspace-instance>.cloud.databricks.com/api/2.1/unity-catalog/iceberg-rest - Catalog name: Name of the Unity Catalog containing Iceberg tables
- Access credentials for the external storage location (AWS S3 or Azure)
Note: Use of vended credentials is currently not supported. Hence, Presto requires explicit storage credentials.
Procedure
Results
You can now query Iceberg tables from Databricks Unity Catalog using Presto. The queries execute directly on the data in the external storage location without copying data into watsonx.data.
Example queries and outputs
List schemas in a catalog:
SHOW SCHEMAS IN databricks_catalog;
Output:
Schema
--------------------------
default
delta_share_demo
feb14schema
information_schema
mrmadira_external_schema
tpcdsdbiceberg_10tb_partitioned_uc
List tables in a schema:
SHOW TABLES IN databricks_catalog.feb14schema;
Output:
Table
-----------------
avengers
cims_test_result
iceberg_orders
mrmadira_csv_table
Describe table structure:
DESCRIBE databricks_catalog.feb14schema.iceberg_orders;
Output:
Column | Type | Extra | Comment
-------------|-----------------------|-------|--------
order_id | bigint | |
customer_id | bigint | |
order_ts | timestamp(6) | |
total_amt | decimal(12,2) | |
Query a table:
SELECT * FROM databricks_catalog.feb14schema.iceberg_orders LIMIT 10;
Output:
order_id | customer_id | order_ts | total_amt
---------|-------------|---------------------|----------
22222 | 22222 | 2022-01-01 00:00:00 | 100.00
What to do next
Related information