Governing virtual data in Watson Query

Watson Query can integrate with Watson Knowledge Catalog to govern the virtual data that you publish to governed catalogs. Data governance involves applying business context, data policies, and data protection rules to your virtual data.

Before you begin, review the end-to-end process.

  1. Understand catalogs. See Catalogs.
  2. Choose a method for publishing your assets to a catalog. See Methods for publishing assets to the catalog.
  3. Understand the ownership of an asset that is published to a catalog. See Virtual object owner and data asset owner.
  4. Decide to use business terms or data protection rules. See Applying business terms and data protection rules.

Catalogs

Watson Knowledge Catalog is a secure enterprise data catalog management platform. With Watson Knowledge Catalog, you use catalogs to easily find and share your data assets. A catalog is a way to organize, label, and search for data assets. An asset in a catalog consists of metadata about a data asset. Data protection rules are enforced only on data that is published or added to a catalog. For more information about catalogs and data assets, see Catalogs.

If you have the Watson Query Admin or Engineer role, when you virtualize data by using the web client, your virtual data can be published to a governed catalog automatically if you select this option in your service settings. Users with any role can publish virtual data to the catalog manually. Watson Query Admins can also enable enforced automatic publishing to a pre-selected catalog in the Watson Query service settings. For more information, see Methods for publishing assets to the catalog and Publishing virtual data to the catalog.

Additionally, you can add virtual tables and views from the Watson Query connector to a catalog. See Create Watson Query connection in Watson Knowledge Catalog and Adding a data asset from a connection.

Methods for publishing assets to the catalog

In Watson Query, you can use two methods to publish assets to a catalog. You can choose to enforce publishing of all assets that are created with the user interface to a primary catalog or you can allow users to choose to publish to any catalog that they have the Admin or Editor role for.

Enforced publishing method

If you want to enforce publishing to a primary catalog, a Watson Query Admin must enable Enforce publishing to a governed catalog in Service settings > Governance and choose a primary catalog that all virtualized objects that are created with the user interface will be published to. If this setting is enabled, users will not be able to choose the catalog that they publish to when they virtualize data. All assets will be published to the primary catalog automatically.

To change a primary catalog, a Watson Query Admin must satisfy the following requirements:

  • They must be an Admin on the current primary catalog.
  • They must be an Admin on the newly selected primary catalog.
Standard publishing method

If publishing to a primary catalog is not enforced, a user can choose to publish to any catalog that they have the Admin or Editor role for. The user can choose the catalog from the drop-down list on the Virtualize page.

For more information, see Publishing virtual data to a catalog with Watson Query.

Virtual object owner and data asset owner

When a virtual object is published to a catalog, this object becomes represented by a data asset in Watson Knowledge Catalog. There is a difference between virtual object owners and data asset owners:
Virtual object owner
The user that created the virtual object in Watson Query.
Data asset owner
The user that owns the asset for a virtual object in a catalog. Typically, the user who created the virtual object will also be the asset owner when the virtual object is published to the default catalog automatically. However, this might not always be the case.
  • For example, a user might choose not to publish a Watson Query object when it is virtualized. Or the object might have been created by a method that does not automatically attempt to publish the object, such as when the user runs SQL to create a view. The object is then shared with other users. One of those users might publish the object and then that user would become the asset owner instead of the original object creator.
  • Or, the asset owner might be modified in the catalog to change the asset owner.
Asset owners are exempt from Watson Knowledge Catalog data protection rules and policies.

Applying business terms and data protection rules

You can create virtual tables in Watson Query from existing Watson Knowledge Catalog data assets that have business term assignments. Watson Query can use business terms assigned to tables in the catalog to rename table and column names while these tables are being virtualized.

Note: Review the limitations of your data sources that might impact business term assignment. See Supported data sources in Watson Query.

A catalog data asset contains a set of properties that includes business terms and tags. After your virtual data is in a catalog, you can:

  • Assign business terms, data classes, and tags that are authored in Watson Knowledge Catalog to tables and columns and thus, form a logical structure of your virtual data.

    For more information, see Virtualizing data with business terms.

  • Use data protection rules to deny access to your virtual data or mask it. These data protection rules can be based on the assigned tags and business terms. For more information, see Data protection rules.