Set up an external client for an instance group to submit Spark workload to
the instance group from outside of the cluster.
About this task
You can set up an external client for an instance group from the cluster management console.
When SSL is enabled for Spark UIs, you must complete one of the following configurations for your
local-mode applications:
- Create a tier 3 keystore on an external client
- Disable SSL for the Spark driver UI by setting
spark.ssl.ui.enabled=false
- Disable the Spark driver UI by setting spark.ui.enabled=false
Procedure
-
From the cluster management console, click .
-
Click the instance group for which
you want to set up an external client.
-
Click .
- Optional:
Click Download Spark Package to download and extract the Spark version
package on the client. You can skip this step if the client already has the package (for example, if
you have previously set up the external client for this instance group and are updating the
configuration).
-
Create and download the configuration for the instance group:
-
Select your operating system from the drop-down menu.
- Optional:
Update the non-default Spark configuration paths on the server to a local path. If you are
using default configuration settings, you can update the configuration paths in the
spark-env.sh, spark-defaults.conf,
spark-ego-docker.conf, and in the data connector directory if data connectors are configured for the instance group.
Note: You will require a CA keystore file for instance groups and/or ascd with SSL enabled.
You also need a tier 3 keystore file for your client and local mode applications on an external
client. You can obtain keystore files from your cluster administrator. Store the CA keystore file in
the location that is specified in the spark.ego.ssl.rpc.client.keyStore
parameter, and the tier 3 keystore file in the location that is specified in the tier3_keystore
parameter.
- Specify the spark.local.dir directory to use for scratch space in Spark,
including map output files and RDDs that get stored on your disk.
- Specify the HADOOP_CONF_DIR directory that stores the Hadoop configuration
files.
- Specify the JAVA_HOME directory path to the Java executable.
- Specify the client_home directory path on the external client where the
Spark client files are extracted.
Note: If you do not specify client_home, the
deploy_home directory is used by default. If deploy_home
is used, you must either extract the downloaded packages to the same local path that is used for
deploy_home; or you need to open the downloaded
spark-env.sh and spark-defaults.conf files and manually
change the paths to match the client location.
- Specify the spark.ego.ssl.rpc.client.keyStore property. To submit an
application to an SSL enabled instance group, you need to obtain the CA
certificate keystore file from the spark-defaults.conf file, which is located
under the instance group that you want
to submit to.
- Copy the CA certificate keystore file from the path based on the
spark.ego.ssl.rpc.client.keyStore property in
spark-defaults.conf.
- Add the new path into the spark.ego.ssl.rpc.client.keyStore property field.
The store password is automatically populated.
- Specify the tier3_keystore fully qualified path on the external client to
the tier 3 keystore file.
-
Click Create Configuration With Updated Paths.
-
Extract the configuration to the same location as the extracted Spark version package.
-
Install any additional packages that are used by your applications. You can skip this step if
the client already has the necessary packages, or if your applications do not require any additional
packages.
-
Configure driver-side logging for client-mode applications:
-
Copy the log4j.properties.template file from the
<spark-home>/conf directory in your client to the
SPARK_EGO_JARS location in the client:
- For Spark 1.x, the directory is
<spark-home>/lib/ego, where
<spark-home> refers to the Spark directory within the
client_home directory that you specified earlier.
- For Spark 2.x and Spark
3.x, the directory is
<spark-home>/jars/ego, where
<spark-home> refers to the Spark directory within the
client_home directory that you specified earlier.
-
Rename the log4j.properties.template file to
log4j.properties.
- If Kerberos is installed on the instance group hosts, you must complete the
following steps:
- Install Kerberos on the external client host.
- Configure the krb5.conf file so that it matches the
krb5.conf that is on the primary host.
- Ensure that the kinit command is located in the same directory that
is specified by KINITDIR in
$EGO_TOP/kernel/conf/sec_ego_gsskrb.conf on the primary host.
- Copy the keytab that is used for the HDFS cluster access principal into the same
directory that is specified for the primary host.
- Copy the keytab that is used for the cluster authentication from the primary host so that the submission user
has access to it. The path to the keytab is specified in the spark-submit command
using
--conf spark.ego.keytab.
Results
The client set-up is complete, and can now submit workload to the instance group.Note: If the instance group is either reconfigured or
updated, you must repeat steps 5 and 6 to pick up the updated configuration.
What to do next
Submit workload to the instance group. To submit a sample application,
you can run the following command from the client's command
line:spark-submit --master <hostname:port> --class org.apache.spark.examples.SparkPi --conf spark.ego.uname=<user name> --conf spark.ego.passwd=<password> <spark-examples jar file> 1000
Note: If
your instance group uses Dockerized
executors, you must use cluster mode when you submit applications from an external
client.