IBM Support

Spark submit fails with "class not found" when deploying in cluster mode

Troubleshooting


Problem

Summary

Spark jobs can be submitted in "cluster" mode or "client" mode. The former launches the driver on one of the cluster nodes, the latter launches the driver on the local node.

When using submit in cluster mode, a class not found error can occur if the relevant jar files are not accessible. This note addresses an example to show how this can be achieved.

Symptoms

The following is typical of the type of error that might be seen:

Exception in thread "main" java.lang.reflect.InvocationTargetException 
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:606)
at org.apache.spark.deploy.worker.DriverWrapper$.main(DriverWrapper.scala:58)
at org.apache.spark.deploy.worker.DriverWrapper.main(DriverWrapper.scala)
Caused by: java.lang.NoClassDefFoundError: org/apache/spark/streaming/kafka/KafkaUtils$

Cause

Although the jar files were made available to all nodes in the cluster (i.e. via NFS share) the --driver-class-path was not included.

Solution

The following is an example of what was used to resolve the issue:

$ sudo -u cassandra dse spark-submit -v
--conf "spark.serializer=org.apache.spark.serializer.KryoSerializer"
--jars $JARS \ 
--executor-memory 512M \
--total-executor-cores 2 \
--deploy-mode "cluster" \
--master spark://10.1.2.3:6066 \
--supervise \
--driver-class-path $JARS_COLON_SEP \
--class "com.test.example" $APP_JAR "$INPUT_PATH" --files $INPUT_PATH

The above env variables were also set as follows

JARS = /home/bob/spark_job/lib/nscala-time_2.10-2.0.0.jar,/home/bob/spark_job/lib/kafka_2.10-0.8.2.1.jar,/home/bob/spark_job/lib/kafka-clients-0.8.2.1.jar,/home/bob/spark_job/lib/spark-streaming-kafka_2.10-1.4.1.jar,/home/bob/spark_job/lib/zkclient-0.3.jar,/home/bob/spark_job/lib/protobuf-java-2.4.0a.jar
JARS_COLON_SEP = /home/bob/spark_job/lib/nscala-time_2.10-2.0.0.jar:/home/bob/spark_job/lib/kafka_2.10-0.8.2.1.jar:/home/bob/spark_job/lib/kafka-clients-0.8.2.1.jar:/home/bob/spark_job/lib/spark-streaming-kafka_2.10-1.4.1.jar:/home/bob/spark_job/lib/zkclient-0.3.jar:/home/bob/spark_job/lib/protobuf-java-2.4.0a.jar
APP_JAR=spark-job-1.0.jar
INPUT_PATH=test.json

Further info

The following outlines spark submit options

https://spark.apache.org/docs/1.4.1/configuration.html

The following links discuss some examples of using this. The main distinction is if the jar files need to be in the system class path the --driver-class-path option is required

https://issues.apache.org/jira/browse/SPARK-9384

https://forums.databricks.com/questions/706/how-can-i-attach-a-jar-library-to-the-cluster-that.html

Document Location

Worldwide

[{"Type":"MASTER","Line of Business":{"code":"LOB76","label":"Data Platform"},"Business Unit":{"code":"BU048","label":"IBM Software"},"Product":{"code":"SSCR56","label":"IBM DataStax Enterprise"},"ARM Category":[{"code":"","label":""}],"ARM Case Number":"","Platform":[{"code":"PF025","label":"Platform Independent"}],"Version":"All Version(s)"}]

Historical Number

ka06R0000006C3mQAE

Document Information

Modified date:
30 January 2026

UID

ibm17259208