Setting up an external client

Set up an external client for an instance group to submit Spark workload to the instance group from outside of the cluster.

About this task

You can set up an external client for an instance group from the cluster management console.

When SSL is enabled for Spark UIs, you must complete one of the following configurations for your local-mode applications:
  • Create a tier 3 keystore on an external client
  • Disable SSL for the Spark driver UI by setting spark.ssl.ui.enabled=false
  • Disable the Spark driver UI by setting spark.ui.enabled=false

Procedure

  1. From the cluster management console, click Workload > Instance Groups.
  2. Click the instance group for which you want to set up an external client.
  3. Click Manage > Set up an external client.
  4. Optional: Click Download Spark Package to download and extract the Spark version package on the client. You can skip this step if the client already has the package (for example, if you have previously set up the external client for this instance group and are updating the configuration).
  5. Create and download the configuration for the instance group:
    1. Select your operating system from the drop-down menu.
    2. Optional: Update the non-default Spark configuration paths on the server to a local path. If you are using default configuration settings, you can update the configuration paths in the spark-env.sh, spark-defaults.conf, spark-ego-docker.conf, and in the data connector directory if data connectors are configured for the instance group.
      Note: You will require a CA keystore file for instance groups and/or ascd with SSL enabled. You also need a tier 3 keystore file for your client and local mode applications on an external client. You can obtain keystore files from your cluster administrator. Store the CA keystore file in the location that is specified in the spark.ego.ssl.rpc.client.keyStore parameter, and the tier 3 keystore file in the location that is specified in the tier3_keystore parameter.
      • Specify the spark.local.dir directory to use for scratch space in Spark, including map output files and RDDs that get stored on your disk.
      • Specify the HADOOP_CONF_DIR directory that stores the Hadoop configuration files.
      • Specify the JAVA_HOME directory path to the Java executable.
      • Specify the client_home directory path on the external client where the Spark client files are extracted.
        Note: If you do not specify client_home, the deploy_home directory is used by default. If deploy_home is used, you must either extract the downloaded packages to the same local path that is used for deploy_home; or you need to open the downloaded spark-env.sh and spark-defaults.conf files and manually change the paths to match the client location.
      • Specify the spark.ego.ssl.rpc.client.keyStore property. To submit an application to an SSL enabled instance group, you need to obtain the CA certificate keystore file from the spark-defaults.conf file, which is located under the instance group that you want to submit to.
        1. Copy the CA certificate keystore file from the path based on the spark.ego.ssl.rpc.client.keyStore property in spark-defaults.conf.
        2. Add the new path into the spark.ego.ssl.rpc.client.keyStore property field. The store password is automatically populated.
      • Specify the tier3_keystore fully qualified path on the external client to the tier 3 keystore file.
    3. Click Create Configuration With Updated Paths.
  6. Extract the configuration to the same location as the extracted Spark version package.
  7. Install any additional packages that are used by your applications. You can skip this step if the client already has the necessary packages, or if your applications do not require any additional packages.
  8. Configure driver-side logging for client-mode applications:
    1. Copy the log4j.properties.template file from the <spark-home>/conf directory in your client to the SPARK_EGO_JARS location in the client:
      • For Spark 1.x, the directory is <spark-home>/lib/ego, where <spark-home> refers to the Spark directory within the client_home directory that you specified earlier.
      • For Spark 2.x and Spark 3.x, the directory is <spark-home>/jars/ego, where <spark-home> refers to the Spark directory within the client_home directory that you specified earlier.
    2. Rename the log4j.properties.template file to log4j.properties.
  9. If Kerberos is installed on the instance group hosts, you must complete the following steps:
    1. Install Kerberos on the external client host.
    2. Configure the krb5.conf file so that it matches the krb5.conf that is on the primary host.
    3. Ensure that the kinit command is located in the same directory that is specified by KINITDIR in $EGO_TOP/kernel/conf/sec_ego_gsskrb.conf on the primary host.
    4. Copy the keytab that is used for the HDFS cluster access principal into the same directory that is specified for the primary host.
    5. Copy the keytab that is used for the cluster authentication from the primary host so that the submission user has access to it. The path to the keytab is specified in the spark-submit command using --conf spark.ego.keytab.

Results

The client set-up is complete, and can now submit workload to the instance group.
Note: If the instance group is either reconfigured or updated, you must repeat steps 5 and 6 to pick up the updated configuration.

What to do next

Submit workload to the instance group. To submit a sample application, you can run the following command from the client's command line:
spark-submit --master <hostname:port> --class org.apache.spark.examples.SparkPi --conf spark.ego.uname=<user name> --conf spark.ego.passwd=<password> <spark-examples jar file> 1000
Note: If your instance group uses Dockerized executors, you must use cluster mode when you submit applications from an external client.