IBM Support

Setting up an external Spark client

Technical Blog Post


Abstract

Setting up an external Spark client

Body

Author: Jenna Lau Caruso

 

In IBM Spectrum Conductor with Spark 2.2.1, users can now download the files that are required to configure an external Spark client. This allows users to submit Spark applications from a host outside of their IBM Spectrum Conductor with Spark cluster. A Spark client allows users to submit Spark applications to a Spark instance group from the command line, by using the spark-submit command. Within the cluster, the deploy directory of a Spark instance group acts as a client on each host where Spark has been deployed. Submitting Spark applications from an external client can be useful to enable developers to test applications directly from their development environment. The external Spark client is supported on Linux 64-bit, Linux on POWER 64-bit LE, and Windows.

A Spark client has two major components:

  1. The Spark binary package that contains the Spark binaries for the Spark version that is used in your Spark instance group.
  2. The Spark instance group configuration package that contains all the configuration files that are specific to your instance group’s configuration.

Depending on your Spark instance group’s configuration, you might also require some additional components:

  • If your Spark instance group is SSL enabled, you require the CA keystore file to connect to your cluster securely.
  • If there are additional libraries that are required by the driver for your Spark application, you need to install these on the external client host.

You might occasionally need to download new versions of your external client files. If your Spark instance group is updated to a new package version, you will need to refresh the Spark binary package and the Spark instance group configuration package. If your Spark instance group is modified with configuration changes, you will need to refresh the Spark instance group configuration package.

To set up an external Spark client in the cluster management console, select the Manage > Set up an external client option from the Spark instance group overview page, and then follow the on-screen instructions:

imageimage

image

Alternatively, you can use RESTful APIs to download and configure the Spark client, making it possible to script the download and configuration process. For more information on the Spark client RESTful APIs, access the RESTful API documentation in your cluster:

http://<hostname>:8280/cloud/apis/explorer/#/SparkClient

https://<hostname>:8643/cloud/apis/explorer/#/SparkClient

Once you have configured your external Spark client, you are ready to run Spark applications. Applications can be run in either client or cluster mode. For applications that are submitted in cluster mode, the Spark driver will be run on a host inside the cluster. For applications submitted in client mode, the Spark driver will be run on the external client host.

For more information see:

 

[{"Business Unit":{"code":"BU059","label":"IBM Software w\/o TPS"},"Product":{"code":"SS4H63","label":"IBM Spectrum Conductor"},"Component":"","Platform":[{"code":"PF025","label":"Platform Independent"}],"Version":"","Edition":"","Line of Business":{"code":"LOB10","label":"Data and AI"}}]

UID

ibm16163437