Installing the service on Spectrum Conductor clusters

Before a project administrator can install Execution Engine for Apache Hadoop on the Spectrum Conductor cluster, the service must be first installed on Cloud Pak for Data. Review and confirm that you meet the following requirements and are aware of the supported Spectrum conductor versions and platforms before you install the service:

Requirements

System requirements for installing Execution Engine for Apache Hadoop

Installation steps for non-root users

If you plan to install the Execution Engine for Apache Hadoop service as a non-root user, the following permissions should be granted using the visudo command:

Steps for DSXHI non-root installation:

  1. Apply visudo rules for non-root user
  2. su <non-root_user>
  3. sudo yum install <rpm>
  4. sudo chown <non-root_user:non-root_user> -R /opt/ibm/dsxhi/
  5. edit/generate /opt/ibm/dsxhi/conf/dsxhi_install.conf
  6. cd /opt/ibm/dsxhi/bin
  7. sudo python /opt/ibm/dsxhi/bin/install.py

VISUDO template:

## DSXHI
<non-root_user> ALL=(root) NOPASSWD: /usr/bin/yum install <path-to-rpm/rpm>, /usr/bin/yum erase dsxhi*, /usr/bin/chown * /opt/ibm/dsxhi/, /usr/bin/python /opt/ibm/dsxhi/*

Watson Studio interacts with a Spectrum Conductor cluster through the following services:

Service Purpose
Spectrum Conductor REST services Retrieve Anaconda instances, environment names, and instance group information.
Juptyer Enterprise Gateway Submit jobs through Jupyter Enterprise Gateway to Spectrum Spark

Watson Studio user

Every user that is connecting from Watson Studio must be a valid user on the Spectrum Conductor cluster. The recommended way to achieve this is by integrating Watson Studio and the Spectrum Conductor cluster with the same LDAP.

Installing the service

  1. Run the RPM installer. The rpm is installed in /opt/ibm/dsxhi.
  2. If you’re running the install as the service user, run sudo chown <serviceuser\> -R /opt/ibm/dsxhi.
  3. Create a /opt/ibm/dsxhi/conf/dsxhi_install.conf file using /opt/ibm/dsxhi/conf/dsxhi_install.conf.template.SPECTRUM file as a reference.
  4. Fill in the dsxhi_install.conf base on your Spectrum conductor configuration. Use the template to help because it describes what is needed for each field. If you need to use your own custom certificates, see Configuring custom certificates.

  5. Optional: If you need to set additional properties to control the location of Java, use a shared truststore, or pass additional Java options. Update the /opt/ibm/dsxhi/conf/dsxhi_env.sh script to include the appropriate values for the environment variables:

      export JAVA="/usr/jdk64/jdk1.8.0_112/bin/java"
    
      export JAVA_CACERTS=/etc/pki/java/cacerts
    
      export DSXHI_JAVA_OPTS="-Djavax.net.ssl.trustStore=$JAVA_CACERTS"
    
  6. In/opt/ibm/dsxhi/bin, run the ./install.py script to install the service. The script prompts for inputs on the following options (alternatively, you can specify the options as flags):

    • Accept the license terms (Hadoop registration uses the same license as Watson Studio). You can also accept the license through the dsxhi_license_acceptance property in dsxhi_install.conf.

    • You are prompted for the password for the Spectrum Conductor REST endpoints. The value can also be passed through the --password flag or -p flag.

    • For the master secret for the gateway service, the value can also be passed through the --dsxhi_gateway_master_password flag or -g flag.

    • The Java cacerts truststore password is prompted. The value can be passed through --dsxhi_gateway_cacerts_password flag or -c flag.

    • Optional: If the custom_jks property in dsxhi_install.conf is used, provide the password associated with this file. The value can be passed through --custom_jks_password or -d flag.

After the service is installed, the necessary components, such as the gateway service, DSXHI integration services, and Jupyter Enterprise Gateway services are started.

Configuring custom certificates

You can use your existing certificates and not have to modify the system truststore. The following configuration properties convert DSXHI to do the following customizations:

custom_jks
DSXHI typically generates a Keystore, converts it to a .crt, and adds the .crt to the Java Truststore. However, with this configuration, DSXHI allows you to provide a custom Keystore that can be used to generate the required .crt.
dsxhi_cacert
DSXHI previously detected the appropriate truststore to use as part of the installation. With the dsxhi_cacert property, DSXHI allows you to provide any custom truststore (CACERTS), where DSXHI certs are added.
add_certs_to_truststore
This configuration provides options to either add the host certificate to the truststore yourself or DSXHI adds it. If you set the configuration to False, users must add the host certificate to the truststore themselves. DSXHI doesn’t make any changes to the truststore. If you set the configuration to True, DSXHI retains its default behavior to add host certificate to java truststore and on detected datanodes for gateway and web services.

Learn more

See Uninstalling the service on a Spectrum Conductor cluster for information on uninstalling the service.