IBM Support

Watson Studio Local - Scala kernel doesn't start in Jupyter Notebook

Troubleshooting


Problem

When you create a Jupyter notebook in IBM Watson Studio Local  and change the kernel to Scala, the kernel never fully loads and a message is shown saying that the Kernel is busy.

You see that the SparkContext fails to start with error: java.lang.NoClassDefFoundError:Couldnot initialize class org.apache.hadoop.security.UserGroupInformation

This happens with both Jupyter Py27 and Py35, which suggests that Spark is having problems to read the hadoop-commons.jar that provides that class.

If you check the jar in /usr/local/spark/jars you can see that the jar is available there and that the path is included in the CLASSPATH.

Cause

JDBC driver jars added when installing Impala and Hive JDBC drivers are cusing conflicts.

Resolving The Problem

​To make the Scala Kernel and Spark Context start in Jupyter you need to remove most of the jars from the Admin console UI (Scripts > JDBC driver jars in Watson Studio (moveJarClasspath)), except maybe the below:
commons-codec-1.3.jar
commons-logging-1.1.1.jar
httpclient-4.1.3.jar
httpcore-4.1.3.jar
denodo-vdp-jdbcdriver.jar

Except for the denodo-vdp-jdbcdriver.jar (which seems like a legitimate JDBC driver), the use of the other 4 is not clear so if you have these jars, keep them as well.

The other jars you might find in the list are either JDBC drivers that are already built-in on the image, or were dependencies of the Impala/Hive JDBC driver:
ojdbc7.jar
nzjdbc3.jar
mssql-jdbc-6.4.0.jre8.jar
ifxtools.jar
ifxlsupp.jar
ifxlang.jar
ifxjdbc.jar
db2jcc_license_cisuz.jar
db2jcc4.jar
HiveJDBC4.jar
ImpalaJDBC4.jar
TCLIServiceClient.jar
hive_metastore.jar
hive_service.jar
ql.jar
libfb303-0.9.0.jar
libthrift-0.9.0.jar
log4j-1.2.14.jar
slf4j-api-1.5.11.jar
slf4j-log4j12-1.5.11.jar
zookeeper-3.4.6.jar

All of the above jars can be removed as they were causing conflicts to start the spark context on Jupyter (which as a side effect impeded the Scala kernel from starting).

You need to replace the Impala and Hive JDBC drivers, with newer versions (from 2.5.x to 2.6.x) from https://www.cloudera.com/downloads/connectors/impala/jdbc/2-6-15.html and https://www.cloudera.com/downloads/connectors/hive/jdbc/2-6-9.html, respectively.

These newer jars already have their dependencies included in a single jar, which avoid the conflicts with other Spark jars. 

Document Location

Worldwide


[{"Business Unit":{"code":"BU059","label":"IBM Software w\/o TPS"},"Product":{"code":"SSHGWL","label":"IBM Watson Studio Local"},"ARM Category":[{"code":"a8m0z000000bnp2AAA","label":"Modeling->Notebook - Jupyter"},{"code":"a8m0z000000bmvxAAA","label":"Modeling->Notebook - Scala"}],"ARM Case Number":"TS003255620","Platform":[{"code":"PF016","label":"Linux"}],"Version":"All Versions","Edition":"","Line of Business":{"code":"LOB10","label":"Data and AI"}}]

Document Information

More support for:
IBM Watson Studio Local

Component:
Modeling->Notebook - Jupyter, Modeling->Notebook - Scala

Software version:
All Versions

Operating system(s):
Linux

Document number:
6151515

Modified date:
02 April 2020

UID

ibm16151515