Customizing Spark applications and notebooks using the user-home volume

You can persist custom Python packages to use in Spark applications and Python notebooks across Spark instances, projects and deployment spaces in the user-home volume.

Using Python custom packages in a Python 3 folder under user-home/_global_/python-3

  1. Connect to the OpenShift cluster:
    oc login OpenShift_URL:port
  2. Set the context to the project where Cloud Pak for Data is deployed:
    oc project PROJECT-NAME
  3. Start the ibm-nginx deployment pod in debug mode. You must use the user ID in the following code sample.
    oc debug deploy/ibm-nginx --as-user=1000330999
  4. Copy the Python packages to the python-3 directory:
    oc cp <python-package.tar.gz> ibm-nginx-debug:/user-home/_global_/python-3
  5. Untar the package in the ibm-nginx-debug pod:
    cd /user-home/_global_/python-3
    tar xvzf <python-package.tar.gz>
  6. The python-3 directory is already set in PYTHONPATH. Add the following line to the top of your PySpark application to import the package:
    import <package_name>