Synchronizing reference data sets to an external database

You can synchronize reference data sets to an external database and then use them as connected data assets across the platform and externally.

When you externalize a reference data set, its values are synchronized to a supported data source. When the reference data set is synchronized, it can be used as a connected data asset - it can be profiled, queried via SQL, joined with other data sets, and reused consistently across the platform.

Reference data sets are synchronized in the following ways:

  • Automatically - the process is started automatically every time a new version or revision of the reference data set is published;
  • Manually - when you click Run sync in the External synchronization tab of the selected reference data set.

Each reference data set is synchronized to one table, where each reference data value is a separate row. Note that reference data values and any custom columns are synchronized, but the relationships aren't.

The synchronization is one way only, from watsonx.data intelligence to the external database. That means that any edits on the database side can't be synced back to watsonx.data intelligence. It is recommended not to modify the database table to avoid sync issues.

When you delete the reference data set from watsonx.data intelligence, the database table must be deleted manually.

Prerequisites

To enable synchronization for reference data sets, you must have a connection to PostgreSQL version 12 or later.

Configuring synchronization

Required permissions
You must have the Manage glossary permission to set up the connection and enable synchronization globally.

To enable synchronization for reference data sets globally:

  1. From the main menu, go to Administration > Configurations and settings.
  2. Select Asset and artifact definitions.
  3. Click Reference data sets on the left panel.
  4. Switch the Enable external sync for reference data sets toggle.
  5. Select a connection.
  6. Provide a name for the schema to store reference data sets.
  7. Save settings.

Synchronizing reference data sets

When the database connection is set up and synchronization is enabled globally, you can start synchronizing individual reference data sets.

Required permissions

  • To configure synchronization for a specific reference data set: Editor category collaborator or a custom role with edit permission for reference data sets.
  • To trigger manual synchronization (resync): Viewer category collaborator or a custom role with view permission for reference data sets.

To synchronize the reference data set:

  1. From the main menu, go to Governance > All artifacts.
  2. Select Reference data and click the set that you want to synchronize.
  3. Open the External synchronization tab and switch the Enable external sync for reference data set toggle.

Synchronization is now enabled and it will start automatically when the new version or revision of the reference data set is published. You can also start the sync process manually by clicking Run sync.

The reference data set is stored in the system-generated table, as specified on the screen.

You can review the synchronization status in the RDS synchronization status section.

Reference data sets that are externalized to a PostgreSQL database can be registered in the catalog as a connected asset with governance policies applied. You can profile externalized reference data and enforce multi-column validation rules. SQL queries can join reference data sets with operational data in pipelines and analytics workflows.