This is a follow up post to Working with R on a Cluster. Previously, we discussed ways to work with R in a purely distributed command line interface (CLI) environment. Within this post, we’ll detail how to setup your own private installation of R on a cluster that supports modulefiles
.
Motivation
Lately, I’ve been needing to use more cutting edge versions of R than what has been made available by the campus cluster staff. The reason for the need to be current – for a lack of a better word – is to stay abreast of new developments in the R community and take advantage of feature rich R packages. Therefore, I’ve had to resort to building and using my own installation of R on a cluster that uses the CentOS 6.8 operating system (Red Hat Linux). Unfortunately, the guide below is a bit long as a result of the traditional compilation procedures for R not being well suited to the domain of high performance computing (HPC)’s need for modulefiles
and lack of root access (e.g. no /usr/local/...
).
Finding a list of available modules
Before beginning, make sure that you actually need to setup your own version of R by seeing what versions are available on the cluster. To do this, we will invoke module avail
which lists all available modulefiles
on the system.
module avail
------------------------------------------ /usr/share/Modules/modulefiles -------------------------------------------
dot module-git module-info modules null use.own
---------------------------------------------- /usr/local/modulefiles -----------------------------------------------
BerkeleyDB/5.0(default) java/1.6.0(default) openmpi/1.6.4-intel-13.1
Macaulay2/1.4-r12617 java/1.7 openmpi/1.6.5-gcc-4.7.1
R/2.13.2 java/1.7.75 openmpi/1.6.5-intel-14.0
R/2.15.1 java/1.8 openmpi/1.8.4-gcc-4.9.2
R/2.15.3 lapack openmpi/1.8.4-intel-15.0
R/3.0.1 libpwquality/1.2.4 openmpi/2.0.1-gcc-6.2.0
R/3.1.0 libuuid/1.0.2(default) openssl/1.0.1
R/3.1.2 libxml2/2.9.1(default) p7zip/9.20.1
R/3.2.2(default) libxslt/1.1.28(default) p7zip/9.38.1
R/3.2.5 mathematica/10 papi/5.4.1
authconfig/6.2.9 mathematica/11 petsc/3.3-p6(default)
blas mathematica/8.0 php/5.5.11
boost/1.51.0 matlab/7.11 python/2(default)
boost/1.58.0 matlab/7.14 python/2.7.3
bzip2/1.0.6 matlab/8.3 python/2.7.8
cfd/Ansys-14.5 matlab/8.4 python/3
cfd/Ansys-15.0.7 matlab/8.5 python/3.4.0
cfd/Ansys-16.0 matlab/8.6 pythonmod/2.6(default)
cifs-utils/6.4 matlab/9.0 pythonmod/2.7.2
cmake/2.8(default) mc/4.8.13 samba/4.1.11
cmake/3.0.2 mercurial/1.8(default) scilab/5.4.0(default)
cmake/3.6.2 moab/7.2.4 sssd/1.11.2
cracklib/2.9.0 moab/7.2.5 sssd/1.11.6(default)
cuda/5.5 moab/7.2.6 sssd/1.12.0
cuda/6.0 moab/7.2.7 sssd/1.12.1
cuda/6.5 moab/7.2.8 svn/1.6(default)
cuda/7.0 moab/7.2.9 svn/1.8.5
ding-libs/0.4.0 moab/8.0.0 svn/1.9.0
dos2unix/7.3.2 moab/8.0.1 svn/1.9.2
emacs/23.2(default) moab/8.1.0 szip/2.1(default)
env/Physics moab/8.1.1 texlive/2010(default)
env/cse moab/9.0.1 texlive/2015
env/inv-catchenlab moab/9.0.2(default) torque/4.2.3.h4
env/inv-cse mpi/mpich/3.1.3-gcc-4.7.1 torque/4.2.5
env/ncsa mpi/openmpi/1.4-intel torque/4.2.5.h2
env/taub mpiexec/0.84 torque/4.2.6
fftw-3.3.3/mvapich2-2.0b_intel-14.0 mvapich/1.2-gcc+ifort torque/4.2.7
fftw-3.3.3/openmpi-1.6.5_intel-14.0 mvapich2/1.6-gcc(default) torque/4.2.8
fuse/2.9.3 mvapich2/1.6-gcc+ifort torque/4.2.9
gcc/4.7.1(default) mvapich2/1.6-gccdebug torque/5.0.0
gcc/4.9.2 mvapich2/1.6-intel torque/5.0.1
gcc/6.2.0 mvapich2/1.9b-intel-13.1 torque/5.0.1p
gdb/7.11.1(default) mvapich2/2.0b-gcc-4.7.1 torque/5.1.0p
gettext/0.19.4 mvapich2/2.0b-intel-14.0 torque/5.1.1
git/1.7(default) mvapich2/2.1rc1-gcc-4.9.2 torque/5.1.2.h5
grace/5.1(default) mvapich2/2.1rc1-intel-15.0 torque/6.0.1
gsl/1.16 mvapich2/2.2-gcc-6.2.0 torque/6.0.1h3
h5utils/1.12 mvapich2/2.2-intel-17.0 torque/6.0.2(default)
hwloc/1.7.2 mvapich2/mpiexec unzip/unzip60
intel/11.1(default) mysql/5.6.23 utils/makedepend/1.0.5
intel/13.1 octave/3.4(default) valgrind/3.10.1
intel/14.0 openblas/0.2.8-gcc(default) valgrind/3.9.0
intel/15.0 openldap/2.4.40 vim/7.3(default)
intel/15.0.3 openmpi/1.4-gcc visit/2.2.1(default)
intel/16.0.0 openmpi/1.4-gcc+ifort vnc/4.1.1
intel/17.0 openmpi/1.4-intel wine/1.6.2
intltool/0.50.2 openmpi/1.6.4-gcc-4.7.1
From the above, we note that the R versions available are:
- 2.13.2, 2.15.1, 2.15.3, 3.0.1, 3.1.0, 3.1.2, 3.2.2 (default), 3.2.5
Thus, at the time of this writing, we cannot use any version in the 3.3.x line!
Loading a modulefile
Hypothetically speaking, let’s say that you did have a version of R that you wanted to use. In that case, you would load it in your environment using:
module load R
module load R/3.2.2 # equivalent since default
This loads the compiled version of R done by the campus cluster staff into your environment. From there, you can access the R CLI by typing into shell:
R
Peaking at the modulefile
recipe for R
From the module avail
output, we can see that all modulefiles
are stored in /usr/local/modulefiles
. To see what is required to compile R, let’s peak at the contents of the latest R module file.
cat /usr/local/modulefiles/R/3.2.5
#%Module1.0####################################################################
proc ModulesHelp { } {
global _module_name
puts stderr "\tThis module sets up the environment for R, version 3.2.5"
}
set _module_name [module-info name]
module-whatis "R-3.2.5 built with gcc-4.9.2, MKL, java-1.8 and texlive"
module load gcc/4.9.2
module load intel/15.0.3
module load java/1.8
module load texlive/2015
set approot /usr/local/R/R-3.2.5
prepend-path PATH $approot/bin
prepend-path LD_LIBRARY_PATH $approot/lib64:$approot/lib64/R/lib
prepend-path MANPATH $approot/share/man
From the R modulefile
, we note that the following modules have been loaded into the environment:
module load gcc/4.9.2
module load intel/15.0.3
module load java/1.8
module load texlive/2015
Thus, when we compile from source, we will need to make sure the above modules are loaded into the environment.
Compiling R from Source
Preparing to Compile R from Source
Before we can install R from source, we must prepare the installation environment.
The first task is to unload any active module using module purge
.
module purge # Remove all active modules from the environment
Then, we need to load in the suggested modules we gleamed from the looking at the latest R-3.2.5 modulefile
.
module load gcc/4.9.2
module load intel/15.0.3
module load java/1.8
module load texlive/2015
However, the list is not conclusive as there are quite a few missing libraries on the cluster. Though, this may not necessarily be the case for yourself. Therefore, you may wish to skip this section and try to compile R. If it fails, then come back and work through each of the steps.
Having said this, we will now very quickly walk through how to compile the additional dependencies.
To do so, we will define a local library to hold the dependencies:
# Setup a location to store dependencies
local_lib=$HOME/local_lib
# Create the directory
mkdir -p $local_lib
# Export the bash variable (used in modulefiles)
export local_lib
# Append the following to bash profile
echo "export local_lib=$HOME/local_lib" >> ~/.bash_profile
Custom modulefiles
To use our own module files, we must first always load in the use.own
module.
module load use.own
This module searches for custom modulefiles
that have been installed in the ~/privatemodules
directory (e.g. /home/username/privatemodules/module/version
).
zlib
Zlib is a compression library.
Note: Zlib-1.2.11 appears to trigger the following error:
checking if zlib version >= 1.2.5… no
checking whether zlib support suffices… configure: error: zlib library and headers are required
Updated February 27th: This issue has been resolved in configure
script that ships with R version 3.3.3. Turns out the zlib
version string during the check was truncated to a maximum number of five characters, which made the string of 1.2.11 be read as 1.2.1 causing the check requiring a version greater than 1.2.5 to fail.
Thus, I’ve opted to use zlib 1.2.9 instead.
zlib_ver=1.2.9
install_path=$HOME/local_lib/zlib/$zlib_ver
mkdir -p $install_path
wget https://downloads.sourceforge.net/project/libpng/zlib/$zlib_ver/zlib-$zlib_ver.tar.gz?r=https%3A%2F%2Fsourceforge.net%2Fprojects%2Flibpng%2Ffiles%2Fzlib%2F$zlib_ver%2F -O zlib-$zlib_ver.tar.gz
#http://zlib.net/zlib-$zlib_ver.tar.gz # only for the latest version...
tar -xvzf ./zlib-$zlib_ver.tar.gz && cd zlib-$zlib_ver
./configure --prefix=$install_path
make && make install
Next, we create a module file to load the zlib information onto the path as needed.
#%Module1.0####################################################################
proc ModulesHelp { } {
global _module_name
puts stderr "\tThis module sets up the environment for zlib, version 1.2.9"
}
set _module_name [module-info name]
module-whatis "zlib 1.2.9 built with gcc-4.9.2"
set approot $::env(local_lib)/zlib/1.2.9
prepend-path CPATH $approot/include
prepend-path LD_LIBRARY_PATH $approot/lib
prepend-path MANPATH $approot/share/man
bzip2
bzip2 is a high-quality data compressor.
On our cluster, we did have the option to load this module. However, in the event that you do not have this ability. I’m listing the instructions to install it from source next.
Unfortunately, there is a non-standard (e.g. non-carefree) make
way to build this library that I encountered. When I ended up compiling R, I was told I needed to modify the make
file’s CFLAG
to include -fPIC
. However, a simpler solution I found was simply to move the .so
objects into the $install_path/lib
folder.
Failure to perform one or the other option will result in an R compilation error later.
bzip_version=1.0.6
install_path=$HOME/local_lib/bzip2/$bzip_version
mkdir -p $install_path
wget http://www.bzip.org/$bzip_version/bzip2-$bzip_version.tar.gz
tar -xvzf ./bzip2-$bzip_version.tar.gz && cd bzip2-$bzip_version
make -f Makefile-libbz2_so
make && make install PREFIX=$install_path
mv *.so* $install_path/lib/
#%Module1.0####################################################################
proc ModulesHelp { } {
global _module_name
puts stderr "\tThis module sets up the environment for bzip2, version 1.0.6"
}
set _module_name [module-info name]
module-whatis "bzip2 1.0.6 built with gcc-4.9.2"
set approot $::env(local_lib)/bzip2/1.0.6
prepend-path PATH $approot/bin
prepend-path CPATH $approot/include
prepend-path LD_LIBRARY_PATH $approot/lib
prepend-path MANPATH $approot/share/man
xzutils
xzutils is yet another compression library that contains the infamous liblzma
header.
xzutils_version=5.2.3
install_path=$HOME/local_lib/xzutils/$xzutils_version
mkdir -p $install_path
wget http://tukaani.org/xz/xz-$xzutils_version.tar.gz
tar -xvzf ./xz-$xzutils_version.tar.gz && cd xz-$xzutils_version
./configure --prefix=$install_path
make && make install
#%Module1.0####################################################################
proc ModulesHelp { } {
global _module_name
puts stderr "\tThis module sets up the environment for xzutils, version 5.2.3"
}
set _module_name [module-info name]
module-whatis "xzutils 5.2.3 built with gcc-4.9.2"
set approot $::env(local_lib)/xzutils/5.2.3
prepend-path PATH $approot/bin
prepend-path CPATH $approot/include
prepend-path CPLUS_INCLUDE_PATH $approot/include
prepend-path LD_LIBRARY_PATH $approot/lib
prepend-path LIBRARY_PATH $approot/lib
prepend-path MANPATH $approot/share/man
PCRE
PCRE or Perl Compatible Regular Expressions contains a set of functions that implement regular expression pattern matching in a manner similar to Perl 5. (Surprise, not a compression library!)
pcre_version=8.40
install_path=$HOME/local_lib/pcre/$pcre_version
mkdir -p $install_path
wget ftp://ftp.csx.cam.ac.uk/pub/software/programming/pcre/pcre-$pcre_version.tar.gz
tar -xvzf pcre-$pcre_version.tar.gz && cd pcre-$pcre_version
./configure --prefix=$install_path --enable-utf8
make && make install
#%Module1.0####################################################################
proc ModulesHelp { } {
global _module_name
puts stderr "\tThis module sets up the environment for PCRE, version 8.40"
}
set _module_name [module-info name]
module-whatis "PCRE 8.40 built with gcc-4.9.2"
set approot $::env(local_lib)/pcre/8.40
prepend-path PATH $approot/bin
prepend-path CPATH $approot/include
prepend-path LD_LIBRARY_PATH $approot/lib
curl
curl is a command line tool and library for transferring data with URLs.
curl_version=7.52.1
install_path=$HOME/local_lib/curl/$curl_version
mkdir -p $install_path
wget --no-check-certificate https://curl.haxx.se/download/curl-$curl_version.tar.gz
tar xzvf curl-$curl_version.tar.gz && cd curl-$curl_version
./configure --prefix=$install_path
make && make install
#%Module1.0####################################################################
proc ModulesHelp { } {
global _module_name
puts stderr "\tThis module sets up the environment for curl, version 7.52.1"
}
set _module_name [module-info name]
module-whatis "curl 7.52.1 built with gcc-4.9.2"
set approot $::env(local_lib)/curl/7.52.1
prepend-path PATH $approot/bin
prepend-path CPLUS_INCLUDE_PATH $approot/include
prepend-path LD_LIBRARY_PATH $approot/lib
prepend-path LIBRARY_PATH $approot/lib
prepend-path MANPATH $approot/share/man
tcltk
tcltk is a Tool Command Language that some packages in R require to function. In particular, the geoR used in some spatial calculations… It may appear to be odd that I’m installing from source instead of using a system library. But, when I tried to affiliate the path with what was available on the cluster, I was never able to compile code most likely because a development header was missing.
tcltk_version=8.6.6
install_path=$HOME/local_lib/tcltk/$tcltk_version
mkdir -p $install_path
wget http://prdownloads.sourceforge.net/tcl/tcl$tcltk_version-src.tar.gz
tar xzvf tcl$tcltk_version-src.tar.gz && cd tcl$tcltk_version/unix
./configure --prefix=$install_path
make && make install
#%Module1.0####################################################################
proc ModulesHelp { } {
global _module_name
puts stderr "\tThis module sets up the environment for tcltk, version 8.6.6"
}
set _module_name [module-info name]
module-whatis "tcltk 8.6.6 built with gcc-4.9.2"
set approot $::env(local_lib)/tcltk/8.6.6
prepend-path PATH $approot/bin
prepend-path CPLUS_INCLUDE_PATH $approot/include
prepend-path LD_LIBRARY_PATH $approot/lib
prepend-path LIBRARY_PATH $approot/lib
prepend-path MANPATH $approot/share/man
Compiling R from Source
From here, it’s a clear shot to installing R from source by following the recipe in R Installation and Administration manual.
There are a few differences between the traditional install from source and the one necesitated by the cluster environment. Most notably, the installation must be done without root access. As a result there a few configuration options that I suggest using:
- Supply a local directory via
--prefix=
, e.g.--prefix=$HOME/R
- Disable the X Windows System as R will not be rendering any graphics to a UI.
# Unload modules
module purge
# Load system modules
module load gcc/4.9.2
module load intel/15.0.3
module load java/1.8
module load texlive/2015
# Load own modules
module load use.own
module load zlib/1.2.9
module load bzip2-custom/1.0.6
module load xzutils/5.2.3
module load pcre/8.40
module load curl/7.52.1
module load tcltk/8.6.6
# Required to avoid loading the system version of bzip2
export LDFLAGS="-L$local_lib/bzip2/1.0.6/lib"
# R version
r_version=3.3.2
# Grab the latest version of R
wget https://cran.r-project.org/src/base/R-3/R-$r_version.tar.gz
tar xvf R-$r_version.tar.gz && cd R-$r_version
# Configure R to be installed into ~/R
./configure --prefix=$HOME/R/$r_version --with-x=no --enable-R-shlib
make && make install
And now, we must create a modulefile
for R …
#%Module1.0####################################################################
proc ModulesHelp { } {
global _module_name
puts stderr "\tThis module sets up the environment for curl, version 7.52.1"
}
set _module_name [module-info name]
module-whatis "R-3.3.2 built with gcc-4.9.2, MKL, java-1.8, texlive, zlib, bzip2, xzutils, pcre, and curl"
# Load required modules
module load gcc/4.9.2
module load intel/15.0.3
module load java/1.8
module load texlive/2015
# Load custom modules (make sure to load in profile use.own)
module load zlib/1.2.9
module load bzip2-custom/1.0.6
module load xzutils/5.2.3
module load pcre/8.40
module load curl/7.52.1
set approot $::env(HOME)/R/3.3.2
prepend-path PATH $approot/bin
prepend-path LD_LIBRARY_PATH $approot/lib64:$approot/lib64/R/lib
prepend-path MANPATH $approot/share/man