Procedures for each Spark cluster
To create a procedure (master, worker, history server, shuffle service) for a Spark cluster, copy and edit the applicable sample procedure that is included in IBM® Open Data Analytics for z/OS®. The procedure needs to be put in a data set in your PROCLIB concatenation, such as SYS1.PROCLIB.
There are four sample procedures. The high-level qualifier (hlq) depends on your installation. In this document, the default high-level qualifier is AZK.
- hlq.SAZKSAMP(AZKMSTR) - Master
- hlq.SAZKSAMP(AZKWRKR) - Worker
- hlq.SAZKSAMP(AZKHIST) - History Server
- hlq.SAZKSAMP(AZKSHUF) - Shuffle service
Follow the instructions in the sample procedure. For example, SPARK_CONF_DIR
must be set and exported in the procedure. It will not default to
$SPARK_HOME/conf
.
- By invoking sbin/start-shuffle-service.sh or the new started task
- Internally, by the worker when spark.shuffle.service.enabled=true configuration is set. By default, the Spark Shuffle service starts and runs under the Spark Worker when spark.shuffle.service.enabled=true.
- When starting the Shuffle service inside the worker process with the spark.shuffle.service.enabled=true configuration (this is a prerequisite for dynamic allocation).
- When enabling dynamic allocation with the spark.dynamicAllocation.enabled=true config.
When starting the Spark started tasks manually, you should ensure that the Master (and optional History and Shuffle servers) have initialized before starting the Worker. When starting via automation, they can be started in parallel, as the Worker will wait a limited amount of time for the Master to initialize. Spark users may encounter errors if Spark jobs are submitted before all started tasks have initialized.