Configuring z/OS Spark for WML for z/OS

WML for z/OS leverages the high-performance, general execution engine technology of z/OS Spark, a core component of IBM® Open Data Analytics for z/OS®. Built on Apache Spark, z/OS Spark is capable of large-scale data processing and in-memory computing. You must install and configure z/OS Spark for WML for z/OS.

Before you begin

Verify that <mlz_setup_userid> is created and configured and the environment is properly set up for the user on your z/OS system as described in Configuring user ID and z/OS environment for WML for z/OS. Make sure that $IML_HOME/spark is mounted to a zFS file system with at least 4 GB storage available.
Verify that z/OS Spark and required service updates are installed on your z/OS system as described in Installing prerequisite hardware and software.
WML for z/OS works with z/OS Spark 2.2.0 and 2.3.0. Choose the version of Spark to use and point your $SPARK_HOME to it.
Verify that z/OS Spark is configured to be available to the UNIX shell environment.
1. Sign into a UNIX shell session as <mlz_setup_userid> and change to the $IML_HOME directory.
2. Execute $SPARK_HOME/bin/spark-shell to launch the Spark shell session.
3. At the Scala prompt, enter some simple commands, such as sc.parallelize(1 to 1000).count(), :help, :quit, and $SPARK_HOME/bin/run-example SparkPi to test the availability of z/OS Spark.

The procedure below requires that you run the create-spark-runtime.sh script. The script is included in the SMP/E image for WML for z/OS. You can complete the procedure to configure and start Spark before or during the installation of WML for z/OS. Follow the instructions in Installing and configuring WML for z/OS base on z/OS to run the SMP/E program and extract the create-spark-runtime.sh script.

Procedure

Locate the create-spark-runtime.sh script in the <install_dir_zos>/imlpython/bin directory.
Run the create-spark-runtime.sh script as shown below:
```
./create-spark-runtime.sh
```
The script will gather needed information and then perform a sequence of tasks to create and configure a Spark runtime environment for WML for z/OS. The script will stop in the event of a fatal error. You must fix the error and rerun the script.
When prompted, respond by either making selections or entering requested information:
1. Before configuring z/OS Spark, the script checks if the Spark installation is at the required PTF (build) level, which is 2.2.0.12 or 2.3.3. If the required level is not satisfied, the script will stop and the configuration process will fail. In this case, apply the required Spark PTF and rerun the script.
2. Confirm or decline to enable Spark client authentication.
  Important: z/OS Spark includes a client authentication option for securing all connections to the Spark master and REST ports. You can enable, which is the default, or disable the option. If you decide to keep the default setting and enable client authentication, you must also enable security for Jupyter kernel gateway and update the setup of certain system resources. See Customizing your environment for z/OS Spark, Installing and configuring WML for z/OS base on z/OS, and Configuring client authentication for z/OS Spark for details.
3. Enter IP address for your Spark master.
4. Enter port number for your Spark master (or enter Y to accept default port 7077).
5. Enter port number for your Spark master REST API (or enter Y to accept default port 6066).
6. Enter port number for your Spark web UI (or enter Y to accept default port 8080).
The IP addresses and port numbers that you entered or accepted are saved in the $IML_HOME/spark/conf/spark-defaults.conf and $IML_HOME/spark/conf/spark-env.sh files.

After gathering all required information, the script starts the Spark master and worker processes. If the processes are successfully started, you will see a message similar to the following example:
```
Starting Spark master...
Spark master started successfully
Starting Spark worker...
Spark worker started successfully
Congratulations! You have successfully configured and started Spark. 
     Check the parameters used for Spark under $IML_HOME/spark/conf
```
If any error occurs, you will be directed to the Spark master and worker configuration logs. Review the logs, fix the error, and start the processes manually.
1. Issue the following command to start the Spark master process:
```
start_master
```
2. Issue the following command to start the Spark worker process:
```
start_slave
```
If needed, add the start_master and start_slave commands to the $IML_HOME/.profile file for the <mlz_setup_userid>.

By default, the Spark master and worker processes use SPARKM1A and SPARKW1A as their respective job names. You can change these job names by editing the $IML_HOME/.profile file.

If you've enabled Spark client authentication, the script will not start the Spark master and worker processes for you. You will see a message in the following example:
```
Congratulations! You have successfully configured Spark. 
Customize your system to use AT-TLS for client authentication. 
Follow instructions at https://www.ibm.com/support/knowledgecenter/
SS3H8V_1.1.0/com.ibm.izoda.v1r1.azka100/topics/azkic_c_configclientauth.htm. 
Run start-master.sh to start Spark master and spark-slave.sh to start Spark 
worker: spark-slave.sh spark://$MLZ_SPARK_HOST:$MLZ_SPARK_PORT.
```
You must customize your system to use AT-TLS for client authentication by following instructions in both Configuring client authentication for z/OS Spark and Configuring client authentication for z/OS Spark. After you complete your system setup, start the Spark master and worker manually.
Verify that z/OS Spark is successfully configured and started on your system.
Issue the following command to retrieve the name of the Spark example jar file:
```
ls -ls $SPARK_HOME/examples/jars  | grep examples
```
Issue the following command to verify the availability of Spark master and Spark worker:
```
$SPARK_HOME/bin/spark-submit --class org.apache.spark.examples.SparkPi 
--master spark://<host_IP_address>:<sparkMaster-port> 
$SPARK_HOME/examples/jars/<spark-examples_xxx.jar>
```
where xxx is the version of Spark jar file. Spark is properly configured and functions normally if you see a response in the following example:
```
Pi is roughly 3.13742.
```

What to do next

The create-spark-runtime.sh script configures your z/OS Spark with a preset (default) attributes as a starting point. You can review these attributes and settings, along with the IP addresses and port numbers you entered for Step 3, in the $IML_HOME/spark/conf/spark-defaults.conf and $IML_HOME/spark/conf/spark-env.sh files.

Adjust your Spark configuration and optimize its performance based on your actual workload over time. See Configuring Workload Manager for z/OS Spark and Memory and CPU configuration options for instructions.