Syncing external Iceberg data into watsonx.data

There may be different data objects on the external object store (bucket) than on the watsonx.data storage catalog. You can sync the object store (bucket) metadata of the storage with that of watsonx.data storage without moving the data manually. Syncing the metadata allows you to fetch the up-to-date data from the external buckets and align the watsonx.data Apache Iceberg catalog with the remote bucket. The Apache Iceberg catalog is to be attached to the storage for this feature.

watsonx.data Developer edition

watsonx.data on IBM Software Hub

Procedure

  1. In the Infrastructure manager page, click Add component.
  2. Select the storage from the Storage section.
  3. Enter the storage details.
  4. Select Catalog type as Apache Iceberg.
  5. Enter the catalog name.
  6. Click Create to create the storage.
  7. If you change the data in the storage bucket in watsonx.data and you want to pull those changes then, go to the Infrastructure manager page, hover over the Apache Iceberg catalog and click Sync metadata. You can see three options to select the Mode and the corresponding possibility for metadata loss.
  8. The following are the three sync options:
    1. Register new objects only: Schemas, tables, and metadata that is created by external applications since the last sync operations are added to this catalog. Existing schemas and tables in this catalog are not modified.
    2. Update existing objects only: Schemas, tables, and metadata already present in this catalog are updated to match the current state that is found in the associated bucket but not deleted. Any other schemas, tables, and metadata in the associated bucket are ignored.
    3. Sync all objects: Synchronize all the data or update the existing table that was promoted earlier, except removal of objects. Removal of objects is not synced.
  9. After the synchronization is complete, go to the Data manager. You can see the catalog that you created and the tables that are pulled from the bucket.
  10. You can go to the Query workspace and use these tables to select query and insert data into the existing table.

    Related API: For information on related API, see