Updating the Apache Spark configuration files
Complete this task to copy the Apache Spark configuration files to your new configuration directory and update them.
About this task
- spark-env.sh
- A shell script that is sourced by most of the other scripts in the Apache Spark installation. You can use it to configure environment variables that set or alter the default values for various Apache Spark configuration settings. For sample contents of this file, see Sample configuration and AT-TLS policy rules for z/OS Spark client authentication.
- spark-defaults.conf
- A configuration file that sets default values for the Apache Spark runtime components. You can override these default values on the command line when you interact with Spark using shell scripts. For sample contents of this file, see Sample configuration and AT-TLS policy rules for z/OS Spark client authentication.
- log4j.properties
- Contains the default configuration for log4j, the logging package that Apache Spark uses.
You can find templates of these configuration files and the default spark-defaults.conf and spark-env.sh files in the $SPARK_HOME/conf directory. Note that spark-defaults.conf and log4j.properties files are ASCII files. If you have set _BPXK_AUTOCVT=ON as specified in Setting up a user ID for use with z/OS Spark, you can edit them without any explicit conversion.
The spark-shell and
spark-sql interactive command line interfaces
($SPARK_HOME/bin/spark-shell
and $SPARK_HOME/bin/spark-sql
) have
built-in support for the Apache Hive metastore service, which contains an embedded instance of the
Apache Derby database. By default, these interfaces automatically create
metastore_db and spark-warehouse directories and the
derby.log file in the directory from which they are invoked. Therefore, you
must either invoke spark-shell or spark-sql from a writable
directory or set up your configuration files to point to writable directories. If you have multiple
users running these interfaces, ensure that they use different writable directories so that one user
does not attempt to use another user's database.
To set the location of the metastore_db directory, configure the javax.jdo.option.ConnectionURL property in the hive-site.xml file. You can find a sample hive-site.xml file in $SPARK_HOME/conf. For more information about Hive metastore configuration, see Hive Metastore Administration.
To set the location of the
spark-warehouse directory, configure the
spark.sql.warehouse.dir property in the
spark-defaults.conf file, or use the --conf
spark.sql.warehouse.dir
command-line option to specify the default location of the database
in warehouse.
spark.driver.extraJavaOptions -Dderby.stream.error.file=derby_log_file_location
spark.driver.extraJavaOptions -Dderby.system.home=derby_sys_dir
By
default, both the metastore_db directory and the derby.log
file will be created in this Derby system directory.For more information about Apache Derby configuration, see Tuning Derby.
For more information about Apache Spark support for Apache Hive, see http://spark.apache.org/docs/2.4.8/sql-programming-guide.html.