IBM Support

How to submit pyspark applications in Jupyter Terminal

How To


Summary

In a SIG where Jupyter Notebook is available, users may try to use JupyterTerminal to submit PySpark applications in command line. However by default, the environment in the terminal is not ready for the execution, so users may see the error like "ModuleNotFoundError: No module named 'pyspark'".

That is because PYTHONPATH is not set correctly in the terminal. Users can run ". $SPARK_HOME/sbin/spark-config.sh" to set its value, and then PySpark applications can be submitted in Jupyter terminal with "python main.py". However, users still need to call addPyFile() in code to add the paths to the dependencies.

Document Location

Worldwide

[{"Business Unit":{"code":"BU048","label":"IBM Software"},"Product":{"code":"SS4H63","label":"IBM Spectrum Conductor"},"Component":"","Platform":[{"code":"PF016","label":"Linux"}],"Version":"All Versions","Edition":"","Line of Business":{"code":"LOB77","label":"Automation Platform"}}]

Log InLog in to view more of this document

This document has the abstract of a technical article that is available to authorized users once you have logged on. Please use Log in button above to access the full document. After log in, if you do not have the right authorization for this document, there will be instructions on what to do next.

Document Information

Modified date:
21 March 2019

UID

ibm10870958