Jupyter Kernel Gateway on z/OS Tips and Tricks
Antoine Saliba 310000QA7P Visits (3260)
IBM Open Data Analytics for z/OS (IzODA) is an IBM product geared towards providing a data analytics ecosystem that was previously unavailable to z/OS. This enables analytics applications to be developed within a popular interface while also being directly connected to z/OS and its enterprise data. The Jupyter notebook environment is one such example. This piece of software provides a web interface that data scientists can use to gain insight on the many facets of their businesses through analytical queries against any data source from any system.
With IzODA, z/OS sysprogs are now able to install and configure a Jupyter environment using Anaconda’s package management capabilities, and then integrate that environment with either the z/OS Spark’s Scala JVM-based framework or Anaconda’s Python-based analytics stack to retrieve any of the z/OS data sources that are supported by the Optimized Data Layer. We previously blogged our own introduction to IBM Open Data Analytics for z/OS so if you’d like a product overview or more information, refer to IzODA Overview.
We have implemented our own IzODA-based Jupyter Notebook environment and hope to offer some hints and tips learned so far from our experiences to help you get the most out of it. The IzODA Jupyter ecosystem provides two components to work with IzODA on z/OS. JupyterHub runs on a Linux server to host a multi-user Jupyter Notebook web interface for users to interact with. The Jupyter Kernel Gateway (JKG) runs on z/OS and allows JupyterHub to communicate with z/OS. This connection is necessary because it is used to transfer executable code written by data scientists in Jupyter Notebook over to z/OS to be executed. The output is then sent back to JupyterHub and displayed to the user. If you'd like more information on the entire IzODA infrastructure and where JupyterHub and Jupyter Kernel Gateway fit in, please visit the
One of the first things you'll find is that Jupyter Kernel Gateway installs into a user filesystem directory. To control its operations, such as to start or stop it, you may have to manually run USS shell scripts. Furthermore, these bash scripts may also require some environmental configuration to be done. This combination of shell-based user-centric tasks may seem problematic in an environment that may want to use a centralized configuration and started tasks to control and automate the start and shut down of things that run on their systems. A simple approach would be to automate the JKG scripts with shell-based utilities such as cron or init.d by trusted user IDs. In our case, we felt the need to integrate JKG’s start up and shutdown with started tasks controlled by our System Automation policy so we created a BPXBATCH-based started task that, in turn, calls our own JKG controlling bash script. Here is a sample of our JCL started task, which very simply calls our bash script:
//JKG PROC ACTI
// SET SPATH='/scripts'
// SET SCRIPT='jkgw.sh'
//* Program Usage: /S JKG
//* S JKG - Start MLz Jupyter Kernal Gate
//* S JKG,ACTION=STOP - Stop MLz Jupyter Kernal Gateway
//JKG EXEC PGM=
// PARM='SH nohup &SPATH/&SCRIPT &ACTION'
//STDOUT DD SYSOUT=*
//STDERR DD SYSO
Our bash script, named jkgw.sh, sets some of the required environment variables such as _BPX_JOBNAME to make our JKG task easily identifiable and then starts Jupyter Kernel Gateway in the background using the command:
This will send the JKG daemon’s output to the file defined by the $LOGNAME environmental variable. To shutdown Jupyter Kernel Gateway, we execute the following bash command to retrieve the PID of the running JKG instance:
PID=$(COLUMNS=500 ps -ojo
Once we have the PID, we can simply kill the process.
We then iterate over the output and kill all of the processes. This will shut down the Jupyter notebooks and their corresponding Apache Spark applications.
Speaking of Apache Spark, by default Jupyter notebooks will run Spark in local mode. This means that each notebook will create its own master and worker tasks and basically have its own Spark environment brought up. In our environment, we prefer to run Spark in standalone cluster mode. This means that whenever a request for a new Spark application comes in, it will run on a master and workers that are already initialized. We prefer this to reduce overhead and it also allows us to track our Spark applications better and customize our settings rather than use defaults. To enable this, we had to add two statements to each kernel definition. When Jupyter Kernel Gateway is installed, a directory for each kernel type is created. As an example, we currently have configured kernels for apac
Code within Jupyter notebooks is often used to establish connections with data sources. These connections usually require passwords and having them as plain text in the Jupyter notebooks is never a recommended solution for numerous reasons. Data scientists may want to share notebooks or make presentations using their notebooks to allow for live demos and displaying passwords provides a security risk. To prevent this, we use environment variables that are set directly in the same kernel.json files in which we put SPARK_CONF_DIR and SPARK_OPTS. We simply add our password variables and values within the "env" section as such
By doing this, users creating Jupyter notebooks no longer have to type out the password right in their notebooks. In reality, they don’t even need to know the passwords and when the passwords change, it only needs to be changed once within the kernel.json files. Using the environment variables within the notebooks varies depending on the kernel. For example, each Python Jupyter notebook has to have “import sys” at the top and then the environment variables can be accessed using os.e
You’ve now configured JKG and your Jupyter notebooks are working as planned. Once your data scientists have written their analytical queries right in the notebooks, you can use a tool named nbconvert to automate the running of the notebooks created. Nbconvert is a piece of software you can get from the IzODA Anaconda channel that will run the notebooks for you through a simple command. Nbconvert runs on z/OS so if you’ve created notebooks on the web interface of JupyterHub you’ll need to export them as notebooks (.ipynb) to a USS directory on the z/OS machine where JKG runs. Once that’s done, you’ll want to tag the .ipynb files as ASCII using the chtag -tc ISO8859-1 FILE.ipynb command and then run the following command to execute the notebooks
jupyter nbconvert --to notebook --execute FILE.ipynb --allow-errors --output outp
This will execute your FILE.ipynb Jupyter Notebook and save the input and output in a new notebook called outp
Once configured, Jupyter notebooks provide an easy-to-use, robust analytical environment for data scientists to more easily make sense of your enterprise data. We hope the tips and tricks discussed above help you integrate the Jupyter notebook ecosystem within your existing z/OS enterprise environment. This blog post is a work in progress so we’ll update it if we find anything new worth sharing. If you have any questions, comments or suggestions about anything discussed, please feel free to post a comment below.