Adding data connectors

Optionally, add data connectors to your instance group. Data connectors manage the libraries and configurations that are required for hosts to connect to various data sources.

Before you begin

You must be a cluster administrator, consumer administrator, or have the SPARK_INSTANCEGROUP_CONFIGURE and SPARK_DATACONNECTOR_VIEW permissions to configure data connectors for an instance group.

About this task

Add data connectors to an instance group and enable them for all notebooks in the instance group to connect to data sources when you create notebook services and submit Spark batch applications.

IBM® Spectrum Conductor provides five types of built-in data connectors. You can also build your own data connector (for example, to connect to MapR XD). For more information, including limitations when data connectors are configured for an instance group, see Data connectors.

Procedure

  1. Click the Data Connectors tab.
  2. Optional: Enable concurrent user access to run Spark SQL by using Derby for the embedded metastore. When enabled, a default data connector is added to enable concurrent user access to run Spark SQL by using Derby for the embedded metastore. If not enabled, the Derby embedded metastore works only for the first execution user.
    Note: This option is enabled by default. Spark versions not supported: 1.5.2 and 1.6.1.
  3. Click Add to create a data connector.
  4. Enter the name of the data connector. The data connector name must start with a letter and can contain letters, numbers, and dashes. The maximum length is 100 characters.
  5. Select the type of data connector.
    If you selected the IBM Cloud Object Storage data connector, you must set the fs.s3d.data_connector_name.access.key and fs.s3d.data_connector_name.secret.key properties within your application code. If Cloud Object Storage is set as the default file system, when running the notebooks application, the following properties must be set early in the application:
    • fs.s3d.data_connector_name.proxy.host
    • fs.s3d.data_connector_name.proxy.port
    • fs.s3d.data_connector_name.access.key
    • fs.s3d.data_connector_name.secret.key

    If your environment requires a proxy, define the http_proxy or https_proxy environment variable on all hosts in the instance group. Or, set the fs.s3d.data_connector_name.proxy.host and fs.s3d.data_connector_name.proxy.port properties within your application code.

  6. Enter the required configuration settings for the data connector. These settings are different for each data connector type.

    For configuration details about each data connector type, see Data connectors.

  7. Click Save.
  8. Select the data connector to enable for all notebooks in the instance group by selecting the data connector check box.

    The check box includes the name and type of the data connector. For example, the check box might say dc1 and HDFS if you added an HDFS data connector with the name dc1 to the instance group. Enabled data connectors are selected by default when you submit a batch application and enable the data connector for the application from the cluster management console. You can overwrite the data connector at application submission time.

  9. Select the data connector that specifies the fs.default.FS parameter in the Hadoop configuration file from the drop-down menu. The data connector that is selected also applies for all notebooks that are configured for the instance group.

Results

The data connector is added to the list of data connectors in the instance group. If you enabled the data connector, it is available to all notebooks in the instance group.

What to do next

  1. Create and deploy the instance group.
    • Click Create and Deploy Instance Group to create the instance group and deploy its packages simultaneously. In this case, the new instance group appears on the Instance Groups page in the Ready state. Verify your deployment and then start the instance group.
    • Click Create Only to create the instance group but manually deploy its packages later. In this case, the new instance group appears on the Instance Groups page in the Registered state. When you are ready to deploy packages, deploy the instance group and verify the deployment. Then, start the instance group.
  2. Manage or modify data connectors in the instance group configuration. See Modifying instance groups.