Manage packages as a Watson Studio Local administrator
A Watson Studio Local administrator can install Python or R packages in global directories. These packages are available to all users on the cluster.
Tasks for installing global libraries and packages:
- Add a global Spark JAR file
- Install a global Python library
- Install a global Python library when the cluster is not connected to the internet
- Load a global R package
- Install a global R library when the cluster is not connected to the internet
To add a global Spark JAR file that can be used by the Watson Studio Local cluster spark
In the Scripts panel of the Admin console, an administrator can select Add Spark jars in Watson Studio (moveJarClasspath.sh) to upload JAR files to the /user-home/_global_/spark/jars/ directory for use with Spark.
To install a global Python package that can be used by Watson Studio Local cluster Spark
- Sign in to Watson Studio Local as the default cluster administrator (user 999) and create a Python notebook.
- Use the Python pip package installer command to install Python libraries in the Python notebook.
For example, run the following command in a code cell to install the
prettyplotliblibrary in a Python 2.7 environment:!pip install --install-option="--install-lib=/user-home/_global_/python-2.7" prettyplotlib - Restart any runtime pods that need access to the global package, such as notebooks from other projects or users, ml pods, scoring pods, and spark pods.
The target folder for the Python environment is /user-home/_global_/python-v.r.
The installed packages can be used by all notebook users that use the same Python version in the Spark service. Notebook users can now use the Python import command to import the library components. For example, users can run the following command in a code cell:
import prettyplotlib as ppl
To install a global Python library when the cluster is not connected to the internet
- Access the shared volume on the host. As root, do the following actions on the master node:
- Create a directory on the master node to mount the user-home volume. For
example:
mkdir -p /mnt/shared-user-home - Find a storage
node:
kubectl get nodes -l is_storage=trueExample output:
NAME STATUS AGE dev06-kube-storage-1.ibm.com Ready 31d dev06-kube-storage-2.ibm.com Ready 31d dev06-kube-storage-3.ibm.com Ready 31dPick one of the nodes in the output. In this example, you might pick
dev06-kube-storage-1.ibm.com. - Mount the user-home
volume:
mount -t glusterfs <storagehost>:/<namespace>-user-home <mount-point>For example:
mount -t glusterfs dev06-kube-storage-1.ibm.com:/dsx-user-home /mnt/shared-user-home/
- Create a directory on the master node to mount the user-home volume. For
example:
- From a computer that has access to the internet and that has pip and Python v2.7 installed, run
the following command to download the module and its
dependencies:
pip download -d tmp/piptest/prettyplotlib --no-binary :all: prettyplotlib - Use
tarorzipto create an archive of the downloaded files:tar -cf downloadedModule.tar tmp/piptest/prettyplotlib - Copy the archive to the cluster master
node:
scp downloadedModule.tar root@dev06-kube-master-1: - On the cluster master node, unpack the archive onto the shared directory from
above:
cd /mnt/shared-user-home/_global_/ tar -xf ~/downloadedModule.tarNote the location of the directory and the module file:
[root@dbl164-master-1 _global_]# tar -tf ~/downloadedModule.tar tmp/piptest/prettyplotlib/ tmp/piptest/prettyplotlib/brewer2mpl-1.4.1.zip tmp/piptest/prettyplotlib/functools32-3.2.3-2.zip tmp/piptest/prettyplotlib/pyparsing-2.2.0.tar.gz tmp/piptest/prettyplotlib/cycler-0.10.0.tar.gz tmp/piptest/prettyplotlib/python-dateutil-2.6.0.tar.gz tmp/piptest/prettyplotlib/six-1.10.0.tar.gz tmp/piptest/prettyplotlib/pytz-2017.2.zip tmp/piptest/prettyplotlib/matplotlib-2.0.2.tar.gz tmp/piptest/prettyplotlib/subprocess32-3.2.7.tar.gz tmp/piptest/prettyplotlib/numpy-1.12.1.zip tmp/piptest/prettyplotlib/prettyplotlib-0.1.7.tar.gzIn this example, the location of the module file is:
/mnt/shared-user-home/tmp/piptest/prettyplotlib/prettyplotlib-0.1.7.tar.gzOn the pod that is running the notebook server, this location is:
/user-home/tmp/piptest/prettyplotlib/prettyplotlib-0.1.7.tar.gz - From Watson Studio Local, log in as admin, create a new Python notebook,
and enter the following command in a
cell:
!pip install --install-option="--install-lib=/user-home/_global_/python-2.7" --no-index --find-links=/user-home/tmp/_global_/piptest/prettyplotlib2 /user-home/_global_/tmp/piptest/prettyplotlib2/prettyplotlib-0.1.7.tar.gz - Restart any runtime pods that need access to the global package, such as notebooks from other projects or users, ml pods, scoring pods, and spark pods.
--no-binary :all: turn on the pip download step.The installed packages can be used by all notebook users that use the same Python version in the Spark service. Notebook users can now use the Python import command to import the library components. For example, users can run the following command in a code cell:
import prettyplotlib as ppl
To load a global R package
- Log in to Watson Studio Local as admin and create an R notebook.
- Use the R
install.packages()function to install new R packages. For example, run the following command in a code cell to install the ggplot2 package for plotting functions:install.packages("ggplot2")The imported package can be used by all R notebooks that is running in the Spark service.
Now, users can use the R library() function to load the installed package. For example, a user can run the following command in a code cell:
library("ggplot2")
When a user adds this command, they can now call plotting functions from the ggplot2 package in their notebook.
To install a global R library when the cluster is not connected to the internet
- Access the shared volume on the host. As root, do the following actions on the master node:
- Create a directory on the master node to mount the user-home volume. For example:
mkdir -p /mnt/shared-user-home - Find a storage node:
Example output:kubectl get nodes -l is_storage=true
Pick one of the nodes in the output. In this example, you might pickNAME STATUS AGE dev06-kube-storage-1.ibm.com Ready 31d dev06-kube-storage-2.ibm.com Ready 31d dev06-kube-storage-3.ibm.com Ready 31ddev06-kube-storage1.ibm.com. - Mount the user-home volume:
For example:mount -t glusterfs <storagehost>:/<namespace>-user-home <mount-point>mount -t glusterfs dev06-kube-storage-1.ibm.com:/dsx-user-home /mnt/shared-user-home/
- Create a directory on the master node to mount the user-home volume. For example:
- From a computer that has access to the internet, go to R CRAN page and search for packages, and download the package TAR file directly from the
browser or use the following command to download through command line.
First, create the destination folder:
mkdir -p tmp-rThen use wget or curl to download the package by URL found from the CRAN website. wget example:
wget https://cran.r-project.org/src/contrib/ggplot2_2.2.1.tar.gz --directory-prefix=tmp-rIf R is installed on this computer, download the R package in an R session:
download.packages('ggplot2',destdir='tmp-r')A TAR file for that package will be downloaded to folder tmp-r:
$ ls tmp-r ggplot2_2.2.1.tar.gz - Copy the archive to the cluster master node:
scp -f tmp-r root@dev06-kube-master-1:/mnt/shared-user-home/ - On the cluster master node, check the uploaded file or files. In this example, the location of
the module file is:
On the pod that is running the notebook server, this location is:/mnt/shared-user-home/tmp-r/ggplot2_2.2.1.tar.gz/user-home/tmp-r/ggplot2_2.2.1.tar.gz - From Watson Studio Local, sign in as admin, create a new R notebook, and
enter the following command in a cell:
install.packages('/user-home/tmp-r/ggplot2_2.2.1.tar.gz', repos=NULL)
The installed packages can be used by all notebook users that use the same R version in the Spark service. Notebook users can now use the R library() command to load the library components. For example, users can run the following command in a code cell:
library(ggplot2)