Supported Spark, Anaconda, and notebook versions

IBM® Spectrum Conductor bundles Spark version packages that are prepackaged to include Apache Spark binaries and other files that are required for IBM Spectrum Conductor capabilities. It also bundles Anaconda for Anaconda management, and Jupyter notebooks to provide an interactive environment for data manipulation and visualization. These bundled versions are referred to as built-in versions of the software. In addition to built-in versions, you can use newer or updated versions with IBM Spectrum Conductor.

Built-in Spark, Anaconda, and notebook versions

Table 1. Built-in Spark, Anaconda, and notebook versions
Software Version Notes
Spark 2.4.3, 2.3.3, 2.1.1, and 1.6.1  
Anaconda 2019.03 Python 3 Linux, and 2019.03 Python 3 Linux on POWER  
Jupyter notebook 6.0.x and 5.7.8 Supports built-in Spark versions that are included in IBM Spectrum Conductor version 2.4.1 except Spark 1.6.1.

Non built-in, but supported Spark, Anaconda, and notebook versions

In addition to built-in versions, you can use newer or updated versions of Spark, Anaconda, and notebooks (see Using updated or upgraded Spark versions or notebook packages for details).

For Spark, download from IBM Fix Central and import to IBM Spectrum Conductor. Only the Spark versions on IBM Fix Central are compatible with IBM Spectrum Conductor and refer to the readme files or documentation from IBM Fix Central to determine the Spark versions compatible with each IBM Spectrum Conductor version.

For Anaconda, download from Anaconda.

For notebooks, obtain the binaries from your notebook vendor and provide scripts (see Creating notebook packages). Additionally, when using non built-in notebook versions, note this information:
Jupyter notebooks
When you are using Jupyter 4.1.0 or Jupyter 5.0.0 notebooks with Spark version 2.1.0 or higher, only one Jupyter notebook kernel can successfully start SparkContext. All subsequent kernels are not able to start SparkContext (sc). If you try to issue Spark commands on any subsequent kernels without stopping the running kernel, you encounter the following error: NameError: name 'sc' is not defined. This issue is caused by the metastore_db file, which causes the error because it cannot be duplicated under a single directory.

To run more than one Jupyter notebook kernel, you can create sub-directories (folders) within the notebook for each notebook kernel so that a metastore_db file can be created under each directory.

Zepplin notebooks
Although not bundled with IBM Spectrum Conductor, you can use these Zeppelin notebooks:
  • 0.7.3 (supports all Spark versions that are supported in IBM Spectrum Conductor 2.4.1, but it does not support Spark versions 1.5.2, 2.0.1, and 2.1.0 when SSL is enabled. Disable SSL to use Zeppelin 0.7.3 with Spark versions 1.5.2, 2.0.1, and 2.1.0).
  • 0.7.2 (does not support Spark versions 2.2.0 and higher)
  • 0.7.0 (supports all Spark versions that are supported in IBM Spectrum Conductor 2.4.1)
  • 0.5.6 (supports all Spark versions that are supported in IBM Spectrum Conductor 2.4.1)