How To
Summary
As we know, Conductor can be used to run Spark applications from Conductor GUI in two major use cases: batch application and interactive application. But there are use cases to submit application code from command line. This tech note is about how to submit self-contained Java application from command line and run in it Conductor Spark cluster.
Self-contained application is application code that doesn't require "spark-submit" command to launch. It uses "java" command with spark parameters in command line to run the application. Self-contained Java code example can be found in https://spark.apache.org/docs/latest/quick-start.html.
Here are steps how to submit this application to Conductor Spark cluster:
1 Export and install Conductor external client from the SIG which provides the Spark cluster where you want your application to run.
2 Set environment variables: JAVA_HOME and SPARK_HOME
For example:
[root@host]# env | grep JAVA_HOME
JAVA_HOME=/usr/lib/jvm/java-1.8.0-openjdk-1.8.0.272.b10-1.el7_9.x86_64
[root@host]# env | grep SPARK_HOME
SPARK_HOME=/opt/cws-external-client/spark-2.4.3-hadoop-2.7
3 Source Spark env
For example,
[root@host]# set -o allexport
[root@host]# source <Conductor external client path>/conf/spark-env.sh
4 Construct java application command line like following
[root@host]#$JAVA_HOME/bin/java
-cp /opt/cws-external-client/spark-2.4.3-hadoop-2.7/jars/ego/*:/opt/cws-external-client/spark-2.4.3-hadoop-2.7/jars/*:/opt/cws-external-client/spark-2.4.3-hadoop-2.7/examples/jars/*
-Dspark.master="spark://<spark master host in the SIG>:7078"
-Dspark.executor.memory=2g
...<put all the settings from spark-defaults.conf as parameters in the CLI>
org.apache.spark.examples.SparkPi 100
The application should be able to run in Conductor Spark cluster and return results.
Document Location
Worldwide
Log InLog in to view more of this document
Was this topic helpful?
Document Information
Modified date:
07 December 2020
UID
ibm16379206