Updating the Apache Spark configuration files

Complete this task to copy the Apache Spark configuration files to your new configuration directory and update them.

About this task

There are three main Apache Spark configuration files:

spark-env.sh: A shell script that is sourced by most of the other scripts in the Apache Spark installation. You can use it to configure environment variables that set or alter the default values for various Apache Spark configuration settings. For sample contents of this file, see Sample configuration and AT-TLS policy rules for z/OS Spark client authentication.
spark-defaults.conf: A configuration file that sets default values for the Apache Spark runtime components. You can override these default values on the command line when you interact with Spark using shell scripts. For sample contents of this file, see Sample configuration and AT-TLS policy rules for z/OS Spark client authentication.
log4j.properties: Contains the default configuration for log4j, the logging package that Apache Spark uses.

You can find templates of these configuration files and the default spark-defaults.conf and spark-env.sh files in the $SPARK_HOME/conf directory. Note that spark-defaults.conf and log4j.properties files are ASCII files. If you have set _BPXK_AUTOCVT=ON as specified in Setting up a user ID for use with z/OS Spark, you can edit them without any explicit conversion.

The spark-shell and spark-sql interactive command line interfaces ($SPARK_HOME/bin/spark-shell and $SPARK_HOME/bin/spark-sql) have built-in support for the Apache Hive metastore service, which contains an embedded instance of the Apache Derby database. By default, these interfaces automatically create metastore_db and spark-warehouse directories and the derby.log file in the directory from which they are invoked. Therefore, you must either invoke spark-shell or spark-sql from a writable directory or set up your configuration files to point to writable directories. If you have multiple users running these interfaces, ensure that they use different writable directories so that one user does not attempt to use another user's database.

To set the location of the metastore_db directory, configure the javax.jdo.option.ConnectionURL property in the hive-site.xml file. You can find a sample hive-site.xml file in $SPARK_HOME/conf. For more information about Hive metastore configuration, see Hive Metastore Administration.

To set the location of the spark-warehouse directory, configure the spark.sql.warehouse.dir property in the spark-defaults.conf file, or use the --conf spark.sql.warehouse.dir command-line option to specify the default location of the database in warehouse.

To set the location of the derby.log file, configure the following property in the spark-defaults.conf file or as a command-line option to point to the desired Derby log file location:

spark.driver.extraJavaOptions -Dderby.stream.error.file=derby_log_file_location

If you do not need separate directories for the metastore_db directory and the derby.log file, you can configure the Derby system directory by specifying the following property in the spark-defaults.conf file:

spark.driver.extraJavaOptions -Dderby.system.home=derby_sys_dir

By default, both the metastore_db directory and the derby.log file will be created in this Derby system directory.

For more information about Apache Derby configuration, see Tuning Derby.

For more information about Apache Spark support for Apache Hive, see http://spark.apache.org/docs/2.4.8/sql-programming-guide.html.

Procedure

Copy the template or default configuration files into your new configuration directory.

For example:

cp $SPARK_HOME/conf/spark-env.sh.template $SPARK_CONF_DIR/spark-env.sh
cp $SPARK_HOME/conf/spark-defaults.conf.template $SPARK_CONF_DIR/spark-defaults.conf
cp $SPARK_HOME/conf/log4j.properties.template $SPARK_CONF_DIR/log4j.properties

Update the configuration files as necessary as you complete the rest of the customization procedures for Open Data Analytics for z/OS.

If necessary, remember to complete step 2.b now. Also, set and export the SPARK_CONF_DIR environment variable as described in step 3 of Creating the Apache Spark configuration directory.

What to do next

Continue with Creating the Apache Spark working directories.