Creating started tasks for the Spark cluster

The SQL Data Insights (SQL DI) application is powered by an embedded Spark cluster. After you have successfully installed SQL DI, consider managing the cluster by creating and running z/OS® started tasks. You can quickly create the started tasks for the Spark master and worker by customizing the SQLDSPKM and SQLDSPKW sample JCL jobs.

Before you begin

Plan, install, and configure SQL DI as described in Preparing SQL DI installation and Installing SQL DI.

Procedure

Navigate to the $SQLDI_INSTALL_DIR/templates/started-task-samples directory on the z/OS system where your SQL DI runs.
Copy the SQLDSPKM and SQLDSPKW sample JCL files into a data set in your PROCLIB concatenation, such as SYS1.PROCLIB.
Follow the instructions in the sample procedures to customize the environment variables based on your system environment.

For example, set $SPARK_CONF_DIR to SQLDI_HOME/spark/conf.

Copy the spark-zos-started-tasks.sh.template file to the SQLDI_HOME/spark/conf directory by issuing the following command:

cp $SQLDI_INSTALL_DIR/templates/started-task-samples/spark-zos-started-tasks.sh.template 
SQLDI_HOME/spark/conf/spark-zos-started-tasks.sh

Update the spark-zos-started-tasks.sh script in the SQLDI_HOME/spark/conf directory as shown in the following example:

# Java environment variable - REQUIRED
# Default: /usr/lpp/java/J8.0_64
export JAVA_HOME=<PATH_TO_JAVA_HOME>

# SQL DI installation directory - REQUIRED
# Default: /usr/lpp/IBM/db2sqldi/
export SQLDI_INSTALL_DIR=<PATH_TO_SQLDI_INSTALL_DIR>

# OpenBLAS installation directory - REQUIRED
# Default: /usr/lpp/cbclib
export BLAS_INSTALL_DIR=<PATH_TO_BLAS_INSTALL_DIR>

Define a RACF® profile for the new SQLDSPKM and SQLDSPKW started tasks and assign <sqldi_setup_userid> as the owner by issuing the following commands:

RDEFINE STARTED SQLDSPKM.* STDATA(USER(<sqldi_setup_userid>) GROUP(SQLDIGRP))

RDEFINE STARTED SQLDSPKW.* STDATA(USER(<sqldi_setup_userid>) GROUP(SQLDIGRP))

SETROPTS RACLIST(STARTED) REFRESH

Start the SQLDSPKM and SQLDSPKW started tasks by issuing the following MVS commands without any parameter:
```
start SQLDSPKM
start SQLDSPKW
```
To run the Spark started tasks manually, make sure that you start SQLDSPKM before SQLDSPKW. If you automate the run, you can start them in parallel in which the processes triggered by SQLDSPKW will start right after those by SQLDSPKM.
If necessary, stop the SQLDSPKM and SQLDSPKW started tasks by issuing the following MVS commands without any parameter:
```
stop SQLDSPKM
stop SQLDSPKW
```
See Stopping z/OS started tasks for more information about stopping Spark started tasks.