Identical data assets

To consistently govern connected data assets that represent the same physical resource (identical data assets) across multiple governed catalogs and select projects, you must have an accurate and a consistent view of a specific set of asset properties (shared properties) that such assets reference. When shared properties and their values are updated, the changes are immediately visible on all identical data assets across the specified workspaces. Depending on your asset role, you might be able to edit or view the shared properties.

Data assets in projects and data products are identified as identical data assets if they were added to the project from a governed catalog.

Limitations

  • Only connected data assets in governed catalogs that enforce the usage of shared properties across workspaces might be recognized as identical data assets and reference the same shared properties record.
  • For existing governed catalogs, all connected data assets with the same resource or identity key automatically reference a shared properties record.
  • For new governed catalogs, you must specify that you want to enforce shared properties across workspaces when you create a new catalog. You can't change this setting after the catalog is created.

The option isn't available for the following assets:

  • Data assets in ungoverned catalogs.
  • Data assets created from locally uploaded files.
  • SQL Query assets (Data assets with the Query subtype).

If you're working with Git-based projects, you must check out a branch in the Git project first to publish assets from catalog to a Git project.

Data assets and unique identifiers

You can add connected data assets that represent the same physical asset residing in a remote data source to one or more catalogs, projects, and deployment spaces. As a result, the same physical asset is represented in different workspaces by multiple connected data assets. Such assets are called identical data assets. They reference the same shared properties and are assigned the same unique identifier, either an identity key or a resource key.

If a data source definition is present for the connections or data sources, identity keys are used to identify identical data assets across workspaces. If data source definitions aren't defined in the system, resource keys are used instead.

Shared and non-shared properties

If multiple connected data assets share the same identity key or resource key, they're considered identical and as a result have a set of shared properties, such as assigned business terms, data classes, and classifications. Asset properties that are not shared, non-shared properties, specify other metadata information, such as asset name, asset relationships, or connection information, that applies to the data asset only in the context of the workspace that the connected data asset was added to. You can configure custom properties as shared properties or non-shared.

Shared properties
Shared properties are stored in a single record. All identical data assets in governed catalogs reference it. Identical data assets that were added to projects from governed catalogs and aren't being edited in the project also reference the record. If the identical data assets are published in one or more governed catalogs, the metadata information stays the same.

The following is the list of shared properties.

  • Asset membership: asset owners, editors, viewers in catalog assets
  • Description
  • Display name and description
  • Tags
  • Category (User or System)
  • Origin country
  • Assigned business terms
  • Assigned classifications
  • Quality score
  • Column metadata
  • Assigned data classes on columns
  • Custom properties that were created as shared properties
  • Data profile stats

Non-shared properties
Non-shared properties aren't stored in a shared properties record and identical data assets in governed catalogs don't reference the record. Across workspaces, identical data assets might have different non-shared properties.

The following is the list of non-shared properties.

  • Asset name
  • Privacy setting in catalog assets (public, private, or hidden)
  • Created by, created at
  • Modified by, modified at
  • Timestamp
  • User information
  • Connection
  • Revisions
  • Asset relationships
    • Asset to Asset
    • Asset to Column
    • Asset to Artifact
    • Column to Asset
    • Column to Column
    • Column to Artifact
  • Attachments

See Asset properties.

Shared properties in governed catalogs, projects, and deployment spaces

When you add a connected data asset to a governed catalog and it's identified as an identical data asset because it already exists in other governed catalog, you're not assigned the asset owner or asset editor role. You're marked as the asset creator and have the asset viewer role. As a result, you can't update the shared properties and might not be able to complete some create, import, and publish jobs. You might need to change your asset user role first.

If an asset owner or an asset editor edits the shared properties, this is the expected behavior.

Shared properties in catalogs

When a shared property is updated for an identical data asset in a catalog:

  • The update is immediately visible for all identical data assets that are published in different catalogs.
  • The update isn't automatically visible for connected data assets that are already in deployment spaces.
  • If the asset wasn't updated in the project (isn't in the Draft state), the update is visible for connected data assets that are in projects

Shared properties in projects

When a shared property is updated for an identical data asset in a project:

  • The status of the asset changes to Draft. The data asset stays in the Draft state until it is published to a catalog.
  • The update is visible only to the project members. If an identical data asset is in the Draft state and its shared properties are updated in a catalog, the updates aren't visible for the identical data asset in the Draft state.
  • When the Draft asset is published to a catalog, the update is visible for all other identical data assets that are published in different catalogs and projects (data assets that were cloned from a catalog to project) and aren't in the Draft state.

Shared properties in deployment spaces

Data assets in deployment spaces don't reference shared properties, even if they're recognized as identical data assets.

If shared properties are updated in catalogs, they updates aren't available for identical data assets in deployment spaces.