IBM Support

Setting up a local repository server for R and R packages

Question & Answer


Question

How do I install R and R packages if the Apache Hadoop cluster does not have internet access?

Answer

The example given below is based on RedHat Linux 7 on an x86_64 or Power platform, unless otherwise stated. It assumes there is a system that has internet access to perform the following operations. This system should have the same level of OS packages installed as all nodes in the cluster to be installed with IBM BigInsights Big R service. This system will be acting as the proxy (or the local repo server) to every node in the cluster.

1. Install RPM repositories.

a) Install EPEL repository

EPEL is the popular Linux repository hosting software RPMs for RedHat OS. For more information about the EPEL repo, see https://fedoraproject.org/wiki/EPEL.

Software packages for R are hosting on EPEL repository. If EPEL repository already exists on the system, this step can be skipped. Otherwise, to install EPEL repo, download the lastest epel-release RPM from EPEL https://dl.fedoraproject.org/pub/epel/epel-release-latest-7.noarch.rpm.

Then install by using yum as shown in the following example:

yum install epel-release-latest-7.noarch.rpm –y

After the installation, the epel.repo file should be created in /etc/yum.repos.d directory.

b) Install CentOS repository (Optional)

This step is optional. There may be some dependent packages required while installing R. They may be found on one of the existing repositories been configured on the system. If the RedHat optional and supplemental repositories are enabled in the system, these dependencies will be able to resolve. And so, this step can be skipped. Otherwise, install CentOS repository may help find those dependent packages. For information about CentOS repo, see http://wiki.centos.org/AdditionalResources/Repositories.

To install CentOS repo, download the latest CentOS RPM from CentOS repository, for example,
http://mirror.centos.org/centos/7/os/x86_64/Packages/centos-release-7-3.1611.el7.centos.x86_64.rpm.

Then install by using yum as shown in the following example:

yum install centos-release-7-3.1611.el7.centos.x86_64.rpm -y

After the installation, the CentOS.repo file should be created in /etc/yum.repos.d directory.

c) Clean metadata for YUM repositories

Run the following script to clean the metadata for YUM repositories.

yum clean all –y

2. Find out all the dependent packages for R-devel

The R package required for R is R-devel. Since every system is different, there is no way to know
what packages have been installed on each individual system. Hence there is not an accurate way to
provide one list of packages that works for all systems.

We suggest that you work on a system that has the exact Linux installation as every node in the
cluster to run the following command and download the R-devel and all its dependent RPMs.

yum install --downloadonly --downloaddir=<directory_for_packages> R-devel

Note: Replace <directory_for_packages> with the path to save all the RPM packages for R-devel. This assumes that all RPMs can be found from the EPEL (and CentOS) repos set up above. If there are still dependent RPMs that cannot be found, search for those from other sites.

Then, run the following to install R-devel and verify that it is installed.

cd <directory_for_packages>

yum install *.rpm -y

By default, R is installed to /usr/lib64/R directory.

3. Set up an httpd server

This step is optional. If the system already has httpd server, this step can be skipped.

httpd package can be found from the CentOS repo.

Run following to install the httpd server:

yum install httpd –y

After the installation, see
https://access.redhat.com/documentation/en-US/Red_Hat_Enterprise_Linux/5/html/Cluster_Administration/s1-apache-inshttpd-CA.html
for information about configuring the httpd server.

4. Create local YUM repo.

Assume that the httpd server above has the URL repo.example.com, and the root directory for the httpd server is /www/uploads. Now copy or ftp all of the files from <directory_for_packages> in step 3 to /www/uploads.

Then, run following to create the YUM repo,

yum install createrepo -y

createrepo --database /www/uploads

For more information on how to create a YUM repo, see
https://access.redhat.com/documentation/en-US/Red_Hat_Enterprise_Linux/6/html/Deployment_Guide/sec-Yum_Repository.html

Input http://repo.example.com in the user.R.repository field in the configuration page when install through Ambari R service. If you will install the R manually instead, you need to create an r.repo file that points to this YUM repo, with the following content.

[local]
name=local
baseurl=http://repo.example.com/
enabled=1
gpgcheck=0


5. Find out dependent packages for base64enc, data.table and rJava.

Run R, and then from the R environment, run the following:

pdb <- available.packages()
deps <- tools::package_dependencies(c('base64enc', 'data.table', 'rJava'), pdb, recursive=T)
deps <- as.character(unlist(deps))
deps <- c(deps, 'base64enc', 'data.table', 'rJava')
installed <- installed.packages()
installed <- as.character(installed[,1])
x <- intersect(deps, installed)
tbd <- setdiff(deps, x)
dir.create('/www/uploads/src/contrib', recursive=T) # the path to save the packages
download.packages(tbd, '/www/uploads/src/contrib')
tools::write_PACKAGES('/www/uploads/src/contrib')

Note: The path to save R packages is /www/uploads/src/contrib, which is a subdirectory, src/contrib, to the httpd server’s root directory /www/uploads.

6. The R packages repo url link is repo.example.com.

This link is to be input to user.RPackages.repository field in the configuration page when install through Ambari R service. If you install the R package manually, run the following from the R environment:

options(repos=c('http://repo.example.com') ) # points to the local repo
install.packages(c('base64enc', 'data.table', 'rJava'))

Troubleshooting for rJava package

If the installation of the rJava package fails, it is very likely that the java env is not set up properly for R. This requires the following to correct the issue while logging in as the root user.

First, make sure the JAVA_HOME is set to the one that is supported by the product.

Then run

R CMD javareconf -e
export JAVA_LIBS="$JAVA_LIBS -ldl"
R CMD javareconf

After this, the /usr/lib64/R/etc/Makeconf is updated with the proper Java and Java options.

Then, retry the rJava package installation.

Troubleshooting for CentOS repo installation.

It is possible that in step 1b, the CentOS RPM fails to install.

The reason is the system’s release number is set as 7Server or 6Workstation instead of 7. To work around this issue, manually create the centos.repo in /etc/yum.repos.d with the following content:

[centos]
name=CentOS
baseurl=http://mirror.centos.org/centos/7/os/$basearch/
enabled=1
gpgcheck=0

[{"Product":{"code":"SSCRJT","label":"IBM Db2 Big SQL"},"Business Unit":{"code":"BU059","label":"IBM Software w\/o TPS"},"Component":"Install","Platform":[{"code":"PF016","label":"Linux"}],"Version":"4.0.0;4.1.0;4.2.0","Edition":"","Line of Business":{"code":"LOB10","label":"Data and AI"}}]

Document Information

Modified date:
18 July 2020

UID

swg21964992