Deploying R to IBM Db2 Warehouse
Before you can use R with Db2 Warehouse, you must first deploy R to your Db2 Warehouse system. Use one or both of the following methods for deploying R, depending on which capabilities you require.
Note: If you are on IAS with a Podman container (version 1.0.25.0 or higher), you need to run the
following command before you can install R or R
packages.
export TMPDIR=/scratch/tmp
- Method 1
- Create an R instance in your Db2 Warehouse container or your Integrated Analytics System container. Use this method if you want to use R with Spark.
- Method 2
- Create an RStudio® Docker container and deploy it to your Db2 Warehouse host node. Use this method if you want to set up an integrated RStudio development environment with which to develop R scripts.
Method 1: Creating an R instance in your Db2 Warehouse container
If you want to use SparkR, you must create an R instance in your Db2 Warehouse container.
- If a version of R is already installed, remove this version and all the included libraries.
- Select a location from
https://cran.r-project.org/mirrors.html
. - In the section Source Code for all Platforms, identify the version of R
to be
deployed.Note: R version 3.6.1 is tested and compiles correctly.
- On the Docker host node, from outside your Db2 Warehouse
container, enter the following command.
Connect to the Db2 Warehouse node by using the following command.docker exec -it Db2wh /bin/bash
Where user is the name of the user, and host is the name of the docker host. After you enter your password, you’re logged inside the corresponding Db2 Warehouse docker container.ssh user@host -p 50022
- Download a copy of the R source into the
/tmp
folder. For example, the following command downloads R version 3.6.1 from the locationhttps://cran.uni-muenster.de/
.wget -P /tmp http://cran.uni-muenster.de/src/base/R-3/R-3.6.1.tar.gz
- Extract the source files.
tar zx -C /tmp -f /tmp/R-3.6.1.tar.gz
- Enter the following commands to compile and install the R environment in the mkdir
/mnt/blumeta0/R-Install
folder.
cd /tmp/R-3.6.1 export LD_LIBRARY_PATH= ./configure --with-x=no --prefix=/mnt/blumeta0/R-Install/ --exec-prefix=/mnt/blumeta0/R-Install/ make prefix=/mnt/blumeta0/R-Install/ exec-prefix=/mnt/blumeta0/R-Install/R/ all install
- Go into the interactive R shell by entering the following
command:
R --vanilla
- To be able to run an R script by using Apache Spark, Db2 Warehouse requires the RJSONIO package, the RODBC version
1.3-16 package (which is compatible with R-3.6.1), the ibmdbR package, and the ggplot2 package. To
install these packages, from within the interactive R shell, enter the following
commands:
install.packages("https://cran.r-project.org/src/contrib/Archive/RODBC/RODBC_1.3-16.tar.gz", repos=NULL, type="source") install.packages("https://cran.r-project.org/src/contrib/Archive/arules/arules_1.6-8.tar.gz", repos=NULL, type="source") install.packages(c('RJSONIO', 'ibmdbR', 'ggplot2'))
To install another R package, from within the interactive R shell, enter a command of the
following
form.
install.packages('package name', dependencies=TRUE)
For example, IBM® offers the following R package for use
with Db2 Warehouse:
- ibmdbRXt
- This package contains extensions to the ibmdbR package, including in-database geospatial functions. For more information about this package, its prerequisites, and how to install it, see https://github.com/ibmdbanalytics/ibmdbRXt/.
To verify the R installation:
- Load the Spark sample files.
- Issue the following command to submit the verification R
script:
spark-submit.sh ClusterVerify.R
- Note the submission ID returned by the previous command.
- Issue the following command repeatedly to check the status of the verification R
script:
Continue issuing this command until the Status column for the job with the corresponding submission ID indicates that the application has ended.spark-submit.sh --list-apps
- Issue the following command to display the application log (replace
xxxxxxxxxxxxxxxxxxx
with the job's submission ID):
If R is installed correctly, the log contains informational messages about the Spark cluster.spark-submit.sh --display-app-log out xxxxxxxxxxxxxxxxxxx
Method 2: Creating and deploying an RStudio Docker container
Note: You cannot use this method when your Db2 Warehouse
system runs on POWER® LE hardware. However, you can use your
own locally installed R development environment instead. For more information about how to use your
own environment, see Connecting an R development environment to a Db2 database.
RStudio is a development environment that you can use to develop and run R scripts. If you want to set up an integrated RStudio environment for use with Db2 Warehouse, create an RStudio Docker container and deploy it to your Db2 Warehouse host node.