Procedures for each Spark cluster

To create a procedure (master, worker, history server, shuffle service) for a Spark cluster, copy and edit the applicable sample procedure that is included in IBM® Open Data Analytics for z/OS®. The procedure needs to be put in a data set in your PROCLIB concatenation, such as SYS1.PROCLIB.

There are four sample procedures. The high-level qualifier (hlq) depends on your installation. In this document, the default high-level qualifier is AZK.

hlq.SAZKSAMP(AZKMSTR) - Master
hlq.SAZKSAMP(AZKWRKR) - Worker
hlq.SAZKSAMP(AZKHIST) - History Server
hlq.SAZKSAMP(AZKSHUF) - Shuffle service

Follow the instructions in the sample procedure. For example, SPARK_CONF_DIR must be set and exported in the procedure. It will not default to $SPARK_HOME/conf.

Note that there are instances where the procedure for the Shuffle service cannot be used. The Shuffle service can be started in two ways:

By invoking sbin/start-shuffle-service.sh or the new started task
Internally, by the worker when spark.shuffle.service.enabled=true configuration is set. By default, the Spark Shuffle service starts and runs under the Spark Worker when spark.shuffle.service.enabled=true.

Do not use the started task (or shell script) during the following:

When starting the Shuffle service inside the worker process with the spark.shuffle.service.enabled=true configuration (this is a prerequisite for dynamic allocation).
When enabling dynamic allocation with the spark.dynamicAllocation.enabled=true config.

If using one or both of these features, the worker or Shuffle service (started via the first service) will fail with a port binding error. The second service to be started will fail as both services bind the Shuffle service port (spark.shuffle.service.port, default 7337).

Note: spark.shuffle.service.enabled has no effect on the Shuffle service when it is started via the shell script or started task. It is a property used by the worker only.

When starting the Spark started tasks manually, you should ensure that the Master (and optional History and Shuffle servers) have initialized before starting the Worker. When starting via automation, they can be started in parallel, as the Worker will wait a limited amount of time for the Master to initialize. Spark users may encounter errors if Spark jobs are submitted before all started tasks have initialized.