DRMAA XMLRPC: XML RPC Protocol Wrapper Around DRMAA Calls

The DRMAA XMLRPC project descibes itself as:

"drmaa-xmlrpc is a quick XMLRPC wrapper around the DRMAA API implemented by most popular cluster schedulers. It is written in C and depends on xmlrpc-c and your scheduler's DRMAA implementation library. The end daemon is an abyss webserver serving out standard XMLRPC calls. API follows DRMAAc bindings - same function names and same arguments (omit any buffers meant to return values)."

http://code.google.com/p/drmaa-xmlrpc/

SunGrid Graphical Accounting Engine

Another interface for job data access: SunGrid Graphical Accounting Engine

The description from http://rdlab.lsi.upc.edu/index.php/serveis/s-gae.html:

"s-gae is a web application designed to display accounting information generated by Oracle Grid Engine (formerly SunGrid Engine) or its free forks such as Open Grid SchedulerSon of Grid Engine, etc. as well as non free forks such as  Univa Grid Engine. This gathered data is stored in a database in order to display eye-candy charts grouped by user, queue or full cluster. Moreover, you can use several filter options to customize the results."

UBMoD: Collecting Statistical Data of Grid Engine Jobs

The open source UBMoD is a tool for retrieving old job data and doing some statistics on it.

Their description from http://ubmod.sourceforge.net/:

"UBMoD (UB Metrics on Demand) is an open source tool for collecting and mining statistical data from cluster resource managers (such as TORQUE, OpenPBS, and SGE) commonly found in high-performance computing environments. It has been developed by the Center for Computational Research at the University at Buffalo, SUNY and presents resource utilization including CPU cycles consumed, total jobs, average wait time, etc. for individual users, research groups, departments, and decanal units. The web-based user interface provides a dashboard for displaying resource consumption along with fine-grained control over the time period and resources displayed...."

 

Statistics with R-Project, Grid Engine, and Open MPI (2012-07-13)

This article demonstrates the usage of the statistical R package with Grid Engine.

Since I recently generated some plots using R I was looking a bit closer to what R supports in terms of cluster computing and parallel computing, especially what can be done with Grid Engine.

There is a really good technical paper about parallel computing packages for R from the LMU (State-of-the-art in Parallel Computing with R).

So lets install the Rmpi (for connectivity with OpenMPI), snow (simple network of workstations), and Rsge package. We assume to have a running (Univa) Grid Engine cluster. If you haven‘t one and you have just a small cluster you can simply download a free 48 core limited version from Univa (www.univa.com). When using the GUI installer your cluster will be setup in just a few minutes. I was using an Univa Grid Engine 8.1 pre-release for doing this.

In order to exploit also MPI capabilities in R you first need a MPI installation. Just download OpenMPI from www.openmpi.org (I took version 1.6). Assuming you have a compiler (gcc/g++/...) installing OpenMPI is pretty simple. Untar the packages and run configure. For built-in Grid Engine support you must add --with-sge and in order to work with Rmpi you need to pass --enable-shared and --enable-static. The prefix is where it is going to be installed.

./configure --prefix=/usr/local --enable-shared --enable-static --with-sge

Build and install.

make all install

Now you need to download the R package (R-base or similar) from the repository of your Linux distribution. Once installed set the LD_LIBRARY_PATH to /usr/local/lib otherwise the MPI libs are not found.

export LD_LIBRARY_PATH=/usr/local/lib

Call R on command line and install the Rmpi package.

(within R)
> install.packages("Rmpi")

In order to load the Rmpi when it is not loaded (like after R restart) you have to type:

>    if (!is.loaded("mpi_initialize")){
library("Rmpi")
}

Spawn the slaves:

> mpi.spawn.Rslaves()

And run the common hello world of Rmpi:
> mpi.remote.exec(paste("I am",mpi.comm.ran(),"of",mpi.comm.size()))

Thats working! You can now make use of the mpi package.

In order to install the snow package I had to download an older version because my R is a little outdated.

http://cran.r-project.org/src/contrib/Archive/snow/

Download the package:

wget http://cran.r-project.org/src/contrib/Archive/snow/snow_0.3-3.tar.gz

Install it on command line:

R CMD INSTALL snow_0.3-3.tar.gz

Now it is time to install the Rsge package:

Be sure that you can do a qsub before starting R. If it is not possible source your $SGE_ROOT/default/common/settings.sh file. Start R and load the Rmpi and snow library first (library(„Rmpi“ and library(„snow“) see above. Now install the Rsge package:

> install.packages("Rsge")

(load the lib when necessary)

After this is done on each compute host in your cluster you can submit Grid Engine jobs from within R. The following example generates numbers from 0.1 to 2.5 (c(1:25/10) and applies on each the exp function and returns the result as an array (parSapply). This is done by sending the task as a job to grid engine.

> sge.parSapply(c(1:25)/10, function(x) exp(x))
Completed storing environment to disk
Submitting  1 jobs...
All jobs completed
[1]  1.105171  1.221403  1.349859  1.491825  1.648721  1.822119  2.013753
[8]  2.225541  2.459603  2.718282  3.004166  3.320117  3.669297  4.055200
[15]  4.481689  4.953032  5.473947  6.049647  6.685894  7.389056  8.166170
[22]  9.025013  9.974182 11.023176 12.182494


If you want to execute the same as above but as 10 different Grid Engine job tasks then try following:

> sge.parSapply(c(1:25)/10, function(x) exp(x), njobs=10)
Completed storing environment to disk
Submitting  10 jobs...
All jobs completed
[1]  1.105171  1.221403  1.349859  1.491825  1.648721  1.822119  2.013753
[8]  2.225541  2.459603  2.718282  3.004166  3.320117  3.669297  4.055200
[15]  4.481689  4.953032  5.473947  6.049647  6.685894  7.389056  8.166170
[22]  9.025013  9.974182 11.023176 12.1824

 

Now you have a MPI and Univa Grid Engine enabled R environment!

When you have a Grid Engine installation where not all hosts have R installed then your job might not get successfully executed. Hence you might change the internal submission parameters of Rsge. This is done by changing the options.

> getOption("sge.qsub.options")
[1] "-cwd

You can see that the internal "qsub" has just a -cwd as command line parameter. You can add a specific host or whatever submission parameter your "qsub" from the Univa Grid Engine installation supports.

In order to force that all jobs are going to host "u1010" you can set the sge.qsub.options in R as follows:

> options(sge.qsub.options=...)

where ... is "-cwd -l h=u1010"

Of course you can also request a specific queue by adding -q <yourqueue> or request a specific core binding with -binding linear:1 for example. Changing submission parameters allows you to specify a specific subset of your cluster with strong compute hosts for your R computations while you are starting an interactive R session itself on high responsive interactive compute nodes with "qrsh".


Job Dependency Visualization with dot

Hervé shows in his blog a small shell script which visualizes jobs and their job dependencies. This is done with parsing qstat output and generating a dot graph which is transformed into a picture.

Eclipse PTP

The Eclipse Parallel Tools Platform describes itself as:

"The PTP project provides an integrated development environment to support the development of parallel applications written in C, C++, and Fortran. Eclipse PTP provides:

  • Support for the MPI, OpenMP and UPC programming models
  • Support for a wide range of batch systems and runtime systems, including PBS/Torque, LoadLeveler, GridEngine, Parallel Environment, Open MPI, and MPICH2
  • A scalable parallel debugger
  • Support for the integration of a wide range of parallel tools"

 

qstat in your browser: xml-qstat

XML-qstat is a free project hosted on github which allows you to watch your cluster status in a more convenient way with your browser: XML-qstat for Grid Engine can be found here.

BLCR Check Pointing Grid Engine Integration Scripts

Grid Engine integration scripts for checkpointing and restart with BLCR can be found in the github project here: https://github.com/HPCKP/BLCR-GridEngine-Integration