Creating started tasks for the Spark cluster

The SQL Data Insights (SQL DI) application is powered by an embedded Spark cluster. After you have successfully installed SQL DI, consider managing the cluster by creating and running z/OS® started tasks. You can quickly create the started tasks for the Spark master and worker by customizing the SQLDSPKM and SQLDSPKW sample JCL jobs.

Before you begin

Procedure

  1. Navigate to the $SQLDI_INSTALL_DIR/templates/started-task-samples directory on the z/OS system where your SQL DI runs.
  2. Copy the SQLDSPKM and SQLDSPKW sample JCL files into a data set in your PROCLIB concatenation, such as SYS1.PROCLIB.
  3. Follow the instructions in the sample procedures to customize the environment variables based on your system environment.

    For example, set $SPARK_CONF_DIR to SQLDI_HOME/spark/conf.

  4. Copy the spark-zos-started-tasks.sh.template file to the SQLDI_HOME/spark/conf directory by issuing the following command:
    cp $SQLDI_INSTALL_DIR/templates/started-task-samples/spark-zos-started-tasks.sh.template 
    SQLDI_HOME/spark/conf/spark-zos-started-tasks.sh
    
  5. Update the spark-zos-started-tasks.sh script in the SQLDI_HOME/spark/conf directory as shown in the following example:
    # Java environment variable - REQUIRED
    # Default: /usr/lpp/java/J8.0_64
    export JAVA_HOME=<PATH_TO_JAVA_HOME>
    
    # SQL DI installation directory - REQUIRED
    # Default: /usr/lpp/IBM/db2sqldi/
    export SQLDI_INSTALL_DIR=<PATH_TO_SQLDI_INSTALL_DIR>
    
    # OpenBLAS installation directory - REQUIRED
    # Default: /usr/lpp/cbclib
    export BLAS_INSTALL_DIR=<PATH_TO_BLAS_INSTALL_DIR>
    
  6. Define a RACF® profile for the new SQLDSPKM and SQLDSPKW started tasks and assign <sqldi_setup_userid> as the owner by issuing the following commands:
    RDEFINE STARTED SQLDSPKM.* STDATA(USER(<sqldi_setup_userid>) GROUP(SQLDIGRP))
    
    RDEFINE STARTED SQLDSPKW.* STDATA(USER(<sqldi_setup_userid>) GROUP(SQLDIGRP))
    
    SETROPTS RACLIST(STARTED) REFRESH
  7. Start the SQLDSPKM and SQLDSPKW started tasks by issuing the following MVS commands without any parameter:
    start SQLDSPKM
    start SQLDSPKW
    

    To run the Spark started tasks manually, make sure that you start SQLDSPKM before SQLDSPKW. If you automate the run, you can start them in parallel in which the processes triggered by SQLDSPKW will start right after those by SQLDSPKM.

  8. If necessary, stop the SQLDSPKM and SQLDSPKW started tasks by issuing the following MVS commands without any parameter:
    stop SQLDSPKM
    stop SQLDSPKW
    

    See Stopping z/OS started tasks for more information about stopping Spark started tasks.