Supported Spark, Anaconda, and notebook versions
IBM® Spectrum Conductor bundles Spark version packages that are prepackaged to include Apache Spark binaries and other files that are required for IBM Spectrum Conductor capabilities. It also bundles Anaconda for Anaconda management, and Jupyter notebooks to provide an interactive environment for data manipulation and visualization. These bundled versions are referred to as built-in versions of the software. In addition to built-in versions, you can use newer or updated versions with IBM Spectrum Conductor.
Built-in Spark, Anaconda, and notebook versions
| Software | Version | Notes |
|---|---|---|
| Spark | 2.4.3, 2.3.3, 2.1.1, and 1.6.1 | |
| Anaconda | 2019.03 Python 3 Linux, and 2019.03 Python 3 Linux on POWER | |
| Jupyter notebook | 6.0.x and 5.7.8 | Supports built-in Spark versions that are included in IBM Spectrum Conductor version 2.4.1 except Spark 1.6.1. |
Non built-in, but supported Spark, Anaconda, and notebook versions
In addition to built-in versions, you can use newer or updated versions of Spark, Anaconda, and notebooks (see Using updated or upgraded Spark versions or notebook packages for details).
For Spark, download from IBM Fix Central and import to IBM Spectrum Conductor. Only the Spark versions on IBM Fix Central are compatible with IBM Spectrum Conductor and refer to the readme files or documentation from IBM Fix Central to determine the Spark versions compatible with each IBM Spectrum Conductor version.
For Anaconda, download from Anaconda.
- Jupyter notebooks
- When you are using Jupyter 4.1.0 or Jupyter
5.0.0
notebooks with Spark version 2.1.0 or higher, only one Jupyter notebook kernel can successfully
start SparkContext. All subsequent kernels are not able to start SparkContext
(sc). If you try to issue Spark commands on any subsequent kernels without
stopping the running kernel, you encounter the following error: NameError: name 'sc' is not
defined. This issue is caused by the metastore_db file, which causes
the error because it cannot be duplicated under a single directory.
To run more than one Jupyter notebook kernel, you can create sub-directories (folders) within the notebook for each notebook kernel so that a metastore_db file can be created under each directory.
- Zepplin notebooks
- Although not bundled with IBM Spectrum
Conductor, you can use these Zeppelin
notebooks:
- 0.7.3 (supports all Spark versions that are supported in IBM Spectrum Conductor 2.4.1, but it does not support Spark versions 1.5.2, 2.0.1, and 2.1.0 when SSL is enabled. Disable SSL to use Zeppelin 0.7.3 with Spark versions 1.5.2, 2.0.1, and 2.1.0).
- 0.7.2 (does not support Spark versions 2.2.0 and higher)
- 0.7.0 (supports all Spark versions that are supported in IBM Spectrum Conductor 2.4.1)
- 0.5.6 (supports all Spark versions that are supported in IBM Spectrum Conductor 2.4.1)