R Packages in the Cluster

On the CRAN system for R (http://www.r-project.org), one can find more than 12000 packages.  When the CRMDA started in early 2010, we noted there were 2400 packages. CRAN is a rapidly growing collection. 

At one time, we would install every package and keep all of them up to date every day. That is not possible any longer.  Rather, the system administrators install R. They don't maintain a package collection anymore, that is something we try do to for CRMDA users.  There is a list of packages that we believe are most heavily used and then we leave the users to install additional packages into their own home directories. We have a blog post with the lastest:  R Packages available for CRMDA cluster members that was prepared in 2017.  It appears that if you want to run jobs across several nodes, it will be necesary to fiddle your user environment, as described in this other blog post  R modules: Super Exciting New Updates. (In retrospect, the title for that post should have been, "announcement about frustrating changes in the cluster and how you can cope with them").  We'll be integrating these instructions at some point.

It may be that a package is urgently needed by many users. If so, we will install it in the system-wide R library. To make a request, contact either Paul Johnson <pauljohn@ku.edu> or write to <clusterhelp@ittc.ku.edu>.

That is not truly necessary, because running install.packages() as a user will trigger a popup message, saying something like "we notice you are not an administrator, do you want to install this in your personal library". 

> install.packages("zipfR", dep = TRUE, repos = "http://rweb.crmda.ku.edu/cran")
Warning in install.packages("zipfR") :
  argument 'lib' is missing: using '/panfs/pfs.acf.ku.edu/crmda/tools/lib64/R/site-library'
Warning in install.packages("zipfR") :
  'lib = "/panfs/pfs.acf.ku.edu/crmda/tools/lib64/R/site-library"' is not writable
Would you like to create a personal library
to install packages into?  (y/n) y

The user says yes, R installs the package, and then when the user starts R, the system looks for the user's packages in that spot, and then it looks in the system libraries.

There may be some trouble in the installation process.  If so, write an email to clusterhelp@ittc.ku.edu and the technicians will check into it. Be sure to include the input & output from the attempted install as well as the return from running sessionInfo().

Common R package functions:

Find out what packages are currently installed in the system

> library()

This combines the system-wide R package collection and the user's collection.

To find out where R is currently searching for packages, run this function

> .libPaths()

The first letter of the function's name is a period, don't forget that part. This is an important function because it helps to make sure that the user's R folder is in the path.

Users that have not yet installed any packages in their home folders see this:

> .libPaths()
[1] "/panfs/pfs.acf.ku.edu/crmda/tools/lib64/R/3.1/site-library"
[2] "/panfs/pfs.acf.ku.edu/cluster/6.2/R/3.1.0/lib64/R/library"

While users who have installed packages see this:

> .libPaths()
[1] "/panfs/pfs.acf.ku.edu/home/your-name-here/R/x86_64-unknown-linux-gnu-library/3.1"
[2] "/panfs/pfs.acf.ku.edu/crmda/tools/lib64/R/3.1/site-library"
[3] "/panfs/pfs.acf.ku.edu/cluster/6.2/R/3.1.0/lib64/R/library"

Additional R Resources

We are learning ways to make R models run faster on the cluster. I believe, at this time, it is not possible to provide a simple 1-2-3 step approach to this, but I have seen some tips.

Valuable Resources

Dirk Eddelbuettel, "Introduction to High-Performance Computing with R: UseR! 2009 Tutorial", Universite Rennes II, Agrocampus Quest, Laboratorie de Mathematiques Appliquees, 7 July 2009. Put simply, this one is about as good as it gets. R 2009 HPC Tutorial »

Dirk Eddelbuettel, "Introduction to High-Performance Computing with R: UseR! 2008 Tutorial," TU Dortmund R 2009 HPC Tutorial  »

Mark Huberty, Course notes "Parallel Programming in R" PS236b Spring 2010: Parallel Programming in R  »

Hana Sevcikova, "snowFT: Generic Framework for Parallel Statistical Computing" Generic Framework for Parallel Statistical Computing  »

"The snowFT package eliminates a few drawbacks of snow and makes it much easier to use (one needs only one function). It can be used on a cluster as well as on a multi-core machines."

CRMDA Calendar

Like us on Facebook
One of 34 U.S. public institutions in the prestigious Association of American Universities
44 nationally ranked graduate programs.
—U.S. News & World Report
Top 50 nationwide for size of library collection.
5th nationwide for service to veterans —"Best for Vets: Colleges," Military Times
Equity & Diversity Calendar

KU Today