Creating jobs to start and stop Spark processes

You can use BPXBATCH to start and stop Spark processes, such as the master and worker.

Note: We recommend that you use started tasks rather than BPXBATCH (see Setting up started tasks to start and stop Spark processes). However, if you prefer, you can use BPXBATCH, as described in this section.
The examples of jobs are based on the following assumptions:
  • The user ID that starts and stops the Spark cluster is SPARKID.
  • The default shell program for SPARKID is bash.
  • Spark is installed in /usr/lpp/IBM/izoda/spark/sparknnn, where nnn is the Spark version (for instance, /usr/lpp/IBM/izoda/spark/spark23x for Spark 2.4.8).
Important: Be sure that all of the required environment variables, such as JAVA_HOME, are set in the environment that is started by BPXBATCH. You can accomplish this in one of the following ways:
  • Export the environment variables in one of the bash startup files.

    Invoke the bash command with the -l option in BPXBATCH. The -l option on the bash command instructs bash to run a login shell, in which bash first reads, and runs commands from the file /etc/profile, if the file exists. After reading that file, bash looks for ~/.bash_profile, ~/.bash_login, and ~/.profile in that order, reads and runs commands from the first one that exists and is readable. You can export your environment variables in ~/.bash_profile, for example, so they are added to the environment.

  • Use the STDENV DD statement to pass environment variables to BPXBATCH. You can specify a file that defines the environment variables or specify them directly in the JCL. For more information about using the STDENV statement, see Passing environment variables to BPXBATCH in z/OS UNIX System Services User's Guide.

Sample job to start the master and worker

Figure 1 shows an example of a BPXBATCH job to start the master and worker.

The -l and -c options on the bash command instruct bash to run a login shell with the entire shell command sequence given between the single quotation marks ('). The semicolons (;) in the command sequence separate different shell commands.

Figure 1. Sample job to start the master and worker
//SPARKMST JOB 'SPARK START',CLASS=K,MSGCLASS=A,
// NOTIFY=&SYSUID,SYSTEM=HOST,USER=SPARKID 
//PRTDS EXEC PGM=BPXBATCH,REGION=0M 
//STDPARM DD * 
SH /bin/bash -l -c 'cd /usr/lpp/IBM/izoda/spark/spark23x/sbin;start-master.sh;
sleep 5;start-slave.sh spark://hostname.yourcompany.com:7077' 
//SYSOUT DD SYSOUT=* 
//STDIN DD DUMMY 
//STDOUT DD SYSOUT=* 
//STDERR DD SYSOUT=* 
//
The bash command in the sample start job issues the following sequence of commands:
  1. Change directories to the Spark installation directory where the administration commands are located.
    cd /usr/lpp/IBM/izoda/spark/spark23x/sbin
  2. Start the master.
    start-master.sh
  3. Sleep for 5 seconds to allow the master time to start.
    sleep 5
  4. Start the worker.
    start-slave.sh spark://hostname.yourcompany.com:7077
    where hostname.yourcompany.com is the name of the host where the master is listening on port 7077.

You can also issue these commands directly from the z/OS® UNIX shell as a quick test of your Spark configuration.

Sample job to stop the master and worker

Figure 2 shows an example of a BPXBATCH job to stop the master and worker. Its logic is similar to the start job, except that it first stops the worker and then stops the master.

Figure 2. Sample job to stop the master and worker
//SPARKSTP JOB 'SPARK STOP',CLASS=K,MSGCLASS=A,
// NOTIFY=&SYSUID,SYSTEM=HOST,USER=SPARKID 
//PRTDS EXEC PGM=BPXBATCH,REGION=0M 
//STDPARM DD * SH /bin/bash -l -c 'cd /usr/lpp/IBM/izoda/spark/spark23x/sbin;
stop-slave.sh;sleep 5;stop-master.sh'
//SYSOUT DD SYSOUT=* 
//STDIN DD DUMMY 
//STDOUT DD SYSOUT=* 
//STDERR DD SYSOUT=* 
//

What to do next

Test your start and stop jobs to ensure that your setup is correct. Then, continue with Configuring memory and CPU options.