Creating a data source definition

Data source definitions are assigned to connections and their associated connected data assets based on endpoints. The endpoints identify a data source by a set of properties in the connection (or connected data asset) such as the hostname or IP address, the port number, and the database name or instance identifier. The method that you choose to create a data source definition depends on if the data source endpoints are already specified in connections.

The endpoints are not specified in connections yet

You have a data source that you know you want to use for a data source definition. After you create the data source definition, the data source definition is assigned to any connections on the platform that match the data source definition’s endpoints. For the procedure, see Creating a data source definition from the Data source definition list.

The endpoints are already specified in connections

If you are unsure if the data source endpoints are already specified in a connection, you can view the connections and associated endpoints on the platform. Then, you can either add the connection endpoints to an existing data source definition or you can create a new data source definition based on those endpoints. For the procedure, see Adding endpoints to a new or existing data source definition.

Prerequisite and requirements for data source definitions

This prerequisite and these requirements apply to data source definitions.

Prerequisite

Cloud Pak for Data common core services must be installed. The common core services are automatically installed by services that rely on them. If you don't see the Connectivity page, it's because none of the services that are installed in your environment rely on the common core services. You access the Connectivity page from the navigation menu at Data > Connectivity.

Requirements

Connection credentials

Because deep enforcement is performed based on the credentials that the user provides for the connection, ensure that the connection's credentials match the logged-in user of the platform. Therefore, it is advised to use personal credentials instead of shared credentials when you create a connection that will be associated with a data source definition. For information about personal and shared credentials, see Adding connections to data sources in a project.

Data source definitions with identical data assets

If a DSD is present for the connections or data sources, identity keys are used to identify identical data assets across workspaces. If DSDs aren't defined in the system, resource keys are used instead.

For databases with multiple IP addresses, instead of relying on resource keys, define DSDs for connections. With DSDs, multiple host or IP addresses are associated with the same connection so that physical tables from that connections can be recognized as identical data assets and not separate assets.

If you didn't use DSDs, the same physical tables might not be recognised as identical data assets. To resolve this issue, you can assign DSDs later so that DSD identity keys are assigned. As a result, if new assets are detected as duplicates of each other, they get consolidated.

Learn more