Configure data source connections

Data source connections describe the source data systems for which IBM Data Cataloging indexes metadata.

Creating data source connections in IBM Data Cataloging identifies source storage systems that are to be indexed by IBM Data Cataloging.

For some data source types, a network connection is (optionally) created to allow for automated scanning and indexing of the source system metadata. IBM Data Cataloging will not index data from unknown sources, so creating a data source connection is the first step towards cataloging any source storage system.

Remember: Depending on when the scan is stopped, stopping a running scan might result in an inconsistent database state for the connection.

You can add data connections to the source storage systems from the IBM Data Cataloging graphical user interface and REST API. For more information on configuring data source connections offline, see Configuring data source connections offline. Configuring data source connections offline in the IBM Data Cataloging: Concepts, Planning, and Deployment Guide.

IBM Data Cataloging discards any data that comes in from an unknown connection. Therefore, connections must be established before data ingestion. To see the list of defined connections, use the Connections tab under the Data source management window of the GUI.

Remember: If you use a MAC, you might have to adjust the scroll bar settings in System Settings to see all available connection types. For example, activate the Show scroll bars: Always option.

Typically, a data source is equivalent to a single file system or object vault or bucket. A data source connection is an alias for the combination of a cluster name and a data source within the cluster. This allows multiple file systems or buckets or vaults with the same name to be indexed by IBM Data Cataloging when they are in separate clusters.

Remember:

IBM Data Cataloging does not support file or file path names that use characters that are not part of the UTF-8 character set.
After you do a manual scan, run a Metadata summarization database refresh:
1. In Data source management > Metadata summarization database, go to Data connections > Discover information.
2. Click refresh and wait for the process to complete successfully. The expected time of completion is based on the amount of records to process. For example, 30 minutes.