Optionally, add data connectors to your instance group. Data connectors manage the
libraries and configurations that are required for hosts to connect to various data sources.
Before you begin
You must be a cluster administrator, consumer administrator, or have the
SPARK_INSTANCEGROUP_CONFIGURE and SPARK_DATACONNECTOR_VIEW permissions to configure data connectors for an instance group.
About this task
Add data connectors to an instance group and enable them for all
notebooks in the instance group to
connect to data sources when you create notebook services and submit Spark batch applications.
IBM® Spectrum
Conductor
provides five types of built-in data connectors. You can also build your own
data connector (for example, to connect to MapR XD). For more information, including
limitations when data connectors are
configured for an instance group, see
Data connectors.
Procedure
-
Click the Data Connectors tab.
- Optional:
Enable concurrent user access to run Spark SQL by using Derby for the embedded metastore. When
enabled, a default data connector is
added to enable concurrent user access to run Spark SQL by using Derby for the embedded metastore.
If not enabled, the Derby embedded metastore works only for the first execution user.
Note: This option is enabled by default. Spark versions not supported: 1.5.2 and 1.6.1.
-
Click Add to create a data connector.
-
Enter the name of the data connector. The data connector name must start with a letter
and can contain letters, numbers, and dashes. The maximum length is 100 characters.
-
Select the type of data connector.
If you selected the IBM Cloud Object Storage
data connector, you must set the
fs.s3d.data_connector_name.access.key and
fs.s3d.data_connector_name.secret.key properties within your
application code. If Cloud Object Storage is set as the default file system, when running the
notebooks application, the following properties must be set early in the application:
- fs.s3d.data_connector_name.proxy.host
- fs.s3d.data_connector_name.proxy.port
- fs.s3d.data_connector_name.access.key
- fs.s3d.data_connector_name.secret.key
If your environment requires a proxy, define the http_proxy or
https_proxy environment variable on all hosts in the instance group. Or, set the
fs.s3d.data_connector_name.proxy.host and
fs.s3d.data_connector_name.proxy.port properties within your
application code.
-
Enter the required configuration settings for the data connector. These settings are different
for each data connector type.
For configuration details about each data connector type, see Data connectors.
-
Click Save.
-
Select the data connector to
enable for all notebooks in the instance group by selecting the data connector check box.
The check box includes the name and type of the data connector. For example, the check box
might say dc1 and HDFS if you added an HDFS data connector with the name
dc1 to the instance group. Enabled data connectors are selected by default when
you submit a batch application and enable the data connector for the application from the
cluster management console. You can overwrite the
data connector at application
submission time.
-
Select the data connector that
specifies the fs.default.FS parameter in the Hadoop configuration file from the
drop-down menu. The data connector
that is selected also applies for all notebooks that are configured for the instance group.
Results
The data connector is added to the list of data connectors in the instance group. If you enabled the data connector, it is available to all
notebooks in the instance group.
What to do next
- Create and deploy the instance group.
- Click Create and Deploy Instance Group to create the instance group and deploy its packages
simultaneously. In this case, the new instance group appears on the Instance Groups page in the Ready state. Verify your deployment and
then start the instance group.
- Click Create Only to create the instance group but manually deploy its packages
later. In this case, the new instance group appears on the Instance Groups page in the Registered state. When you are ready to
deploy packages, deploy the instance group and verify the deployment. Then,
start the instance group.
- Manage or modify data connectors
in the instance group configuration.
See Modifying instance groups.