Configure data source connections

Data source connections describe the source data systems for which Data Cataloging indexes metadata.

Creating data source connections in Data Cataloging identifies source storage systems that are to be indexed by Data Cataloging.

For some data source types, a network connection is (optionally) created to allow for automated scanning and indexing of the source system metadata. Data Cataloging will not index data from unknown sources, so creating a data source connection is the first step towards cataloging any source storage system.

Remember: Depending on when the scan is stopped, stopping a running scan might result in an inconsistent database state for the connection.

You can add data connections to the source storage systems from the Data Cataloging graphical user interface and REST API. For more information on configuring data source connections offline, see Configuring data source connections offline. Configuring data source connections offline in the Data Cataloging: Concepts, Planning, and Deployment Guide.

Data Cataloging discards any data that comes in from an unknown connection. Therefore, connections must be established before data ingestion. To see the list of defined connections, use the Connections tab under the Data source management window of the GUI.
Remember: If you use a MAC, you might have to adjust the scroll bar settings in System Settings to see all available connection types. For example, activate the Show scroll bars: Always option.

Typically, a data source is equivalent to a single file system or object vault or bucket. A data source connection is an alias for the combination of a cluster name and a data source within the cluster. This allows multiple file systems or buckets or vaults with the same name to be indexed by Data Cataloging when they are in separate clusters.

Remember: Data Cataloging does not support file or file path names that use characters that are not part of the UTF-8 character set.