Deploying R to IBM Db2 Warehouse

Before you can use R with Db2 Warehouse, you must first deploy R to your Db2 Warehouse system. Use one or both of the following methods for deploying R, depending on which capabilities you require.

Note: If you are on IAS with a Podman container (version 1.0.25.0 or higher), you need to run the following command before you can install R or R packages.
export TMPDIR=/scratch/tmp
Method 1
Create an R instance in your Db2 Warehouse container or your Integrated Analytics System container. Use this method if you want to use R with Spark.
Method 2
Create an RStudio® Docker container and deploy it to your Db2 Warehouse host node. Use this method if you want to set up an integrated RStudio development environment with which to develop R scripts.

Method 1: Creating an R instance in your Db2 Warehouse container

If you want to use SparkR, you must create an R instance in your Db2 Warehouse container.

  1. If a version of R is already installed, remove this version and all the included libraries.
  2. Select a location from https://cran.r-project.org/mirrors.html.
  3. In the section Source Code for all Platforms, identify the version of R to be deployed.
    Note: R version 3.6.1 is tested and compiles correctly.
  4. On the Docker host node, from outside your Db2 Warehouse container, enter the following command.
    docker exec -it Db2wh /bin/bash
    Connect to the Db2 Warehouse node by using the following command.
    ssh user@host -p 50022
    
    Where user is the name of the user, and host is the name of the docker host. After you enter your password, you’re logged inside the corresponding Db2 Warehouse docker container.
  5. Download a copy of the R source into the /tmp folder. For example, the following command downloads R version 3.6.1 from the location https://cran.uni-muenster.de/.
    wget -P /tmp http://cran.uni-muenster.de/src/base/R-3/R-3.6.1.tar.gz
  6. Extract the source files.
    tar zx -C /tmp -f /tmp/R-3.6.1.tar.gz
  7. Enter the following commands to compile and install the R environment in the mkdir /mnt/blumeta0/R-Install folder.
    cd /tmp/R-3.6.1
    export LD_LIBRARY_PATH=
    ./configure --with-x=no --prefix=/mnt/blumeta0/R-Install/ --exec-prefix=/mnt/blumeta0/R-Install/ 
    make prefix=/mnt/blumeta0/R-Install/ exec-prefix=/mnt/blumeta0/R-Install/R/ all install
  8. Go into the interactive R shell by entering the following command:
    R --vanilla
  9. To be able to run an R script by using Apache Spark, Db2 Warehouse requires the RJSONIO package, the RODBC version 1.3-16 package (which is compatible with R-3.6.1), the ibmdbR package, and the ggplot2 package. To install these packages, from within the interactive R shell, enter the following commands:
    install.packages("https://cran.r-project.org/src/contrib/Archive/RODBC/RODBC_1.3-16.tar.gz", repos=NULL, type="source")
    install.packages("https://cran.r-project.org/src/contrib/Archive/arules/arules_1.6-8.tar.gz", repos=NULL, type="source")
    install.packages(c('RJSONIO', 'ibmdbR', 'ggplot2'))
To install another R package, from within the interactive R shell, enter a command of the following form.
install.packages('package name', dependencies=TRUE)
For example, IBM® offers the following R package for use with Db2 Warehouse:
ibmdbRXt
This package contains extensions to the ibmdbR package, including in-database geospatial functions. For more information about this package, its prerequisites, and how to install it, see https://github.com/ibmdbanalytics/ibmdbRXt/.
To verify the R installation:
  1. Load the Spark sample files.
  2. Issue the following command to submit the verification R script:
    spark-submit.sh ClusterVerify.R
  3. Note the submission ID returned by the previous command.
  4. Issue the following command repeatedly to check the status of the verification R script:
    spark-submit.sh --list-apps
    Continue issuing this command until the Status column for the job with the corresponding submission ID indicates that the application has ended.
  5. Issue the following command to display the application log (replace xxxxxxxxxxxxxxxxxxx with the job's submission ID):
    spark-submit.sh --display-app-log out xxxxxxxxxxxxxxxxxxx
    If R is installed correctly, the log contains informational messages about the Spark cluster.

Method 2: Creating and deploying an RStudio Docker container

Note: You cannot use this method when your Db2 Warehouse system runs on POWER® LE hardware. However, you can use your own locally installed R development environment instead. For more information about how to use your own environment, see Connecting an R development environment to a Db2 database.

RStudio is a development environment that you can use to develop and run R scripts. If you want to set up an integrated RStudio environment for use with Db2 Warehouse, create an RStudio Docker container and deploy it to your Db2 Warehouse host node.