Data connectors
Data connectors manage the libraries and configurations that are required for hosts to connect to various data sources. A data connector contains the type, the URI, the authentication method, and all of the required libraries to access the data source.
Data connectors simplify data source administration and configuration, and separate credential management and data usage from applications. You can configure multiple data connectors for an instance group and switch between data sources on demand from the cluster management console. When data connectors are configured and deployed with your instance group, you can use them to connect to data sources when you create notebook services and submit Spark batch applications.
There are five data connector types built in with IBM® Spectrum Conductor. The following table highlights each of the data connector types and their configuration requirements.
- By design, data connectors for the standard authentication HDFS file system cannot be used in the same application as a Kerberos-protected HDFS file system to prevent anti-affinity configuration errors.
- All data is shared among users in an instance group when using data connectors with notebooks.
- If a instance group is configured to use a data connector for the fs.defaultFS parameter, using Hive context in the Zeppelin notebook is not supported.
For more information about adding data connectors to an instance group, see Adding data connectors.
Data connector type | Configuration requirements |
---|---|
IBM Cloud Object Storage |
|
IBM Spectrum Scale (HDFS Transparency) |
|
HDFS |
|
Kerberos secured HDFS |
|
Kerberos TGT secured HDFS |
|