simplifies the task of job submission and file-staging over a secure
connection from a submission portal (or just your local computer) to your
compute farm. It was an internal project at RDLAB and recently published
under the open source GPL3 license. The scripts are downloadable as a tarball
or accessible via SVN from
http://svn-rdlab.lsi.upc.edu/subversion/ajo/public. The username and
password is both public_ajo.
All what it requires is an ssh client installation as well as Ruby. AJO is a set of Ruby scripts including a configuration file (config.rb), which must be adapted to your environment. You have to set the hostname of the Grid Engine submission host, the path to your remote Grid Engine installation ($SGE_ROOT). For the encryption you need to set cipher salt and keys. The next thing to configure in the configuration file is the local directories and files with data you need on your remote Univa Grid Engine cluster as well as the output directories/files you need back on your local host. Those files/directories are going to be copied transparently to the cluster during job launching. Finally there is a section where you can insert the GE job script contents (the job scripts are generated on the fly). Each of them is going to be started as an own Grid Engine job.
After you configured your template you can launch it with:
./ajo -c config.rb -s Job submitted correctly. The job identifier is da558c9bf39fd052806e69d6afbc36a7e0718a53604eaff47bf6efd081fe40a239aa6572...
The status of your jobs you can track remotely with a secure token generated during submission.
./ajo -q da558c9bf39fd052806e69d6afbc36a7e0718a53604eaff47bf6efd081fe40a239aa... Your job has finished running on Sat Feb 9 09:36:58 2013. You can now do './ajo --retrieve ID' to download the output files and folders.
This token based system makes it an ideal candidate for using within a web based job submission portal.
Finally you want to get the output back to your local machine.
./ajo --retrieve da558c9bf39fd052806e69d6afbc36a7e0718a53604eaff47bf.... Downloaded the output to /tmp/tmp.ZpiQWNfD..
More detailed information about how it works you can find on their web page at RDLAB.
Wok describes itself as follows:
"Wok is a workflow management system implemented in Python that makes very easy to structure the workflows, parallelize their execution and monitor its progress among other things. It is designed in a modular way allowing to adapt it to different infrastructures.
For the time being it is strongly focused on clusters implementing any DRMAA compatible resource manager (i.e. Oracle Grid Engine) which working nodes have a shared folder in common. Other, more flexible infrastructures (such as the Amazon EC2) are considered for future implementations..."
Read more in the github Wok project.
KNIME is a graphical
compute workflow tool based on the Eclipse framework. It is available for
free as well as with commercial support. What you basically have is a
drawing board where you visually design your compute workflow (similar to WEKA).
You can drag and drop and connect different nodes. Nodes are representing
some functionality of your workflow. There are nodes for reading data from
databases or files, nodes for doing some calculations like data
clustering, and nodes for handling output or to visualize the results. The
enterprise edition has capabilities to exploit a compute cluster by
submitting jobs to Univa Grid Engine.
"XtalOpt is a free and truly open source evolutionary algorithm designed to predict crystal structures. It is implemented as an extension to the Avogadro molecular editor."
It also supports Grid Engine as job scheduler. More information provides this blog entry.
Pythongrid is freely available (GNU GPL v2) at github.com and offers job submission and job monitoring capabilities for Grid Engine. The developers describe the project as follows (excerpt from http://code.google.com/p/pythongrid/):
"This module provides high level functionality for cluster computing in python using the Sun Grid Engine. As some cluster environments are notoriously unreliable, pythongrid attempts to handle job monitoring and resubmission (in case of sudden death of nodes) under the hood, while providing the user with a simple map-reduce like interface.
- Uses ZMQ-based heart-beat to monitor job status
- Robust error detection (out-of-memory, node failure)
- Automated resubmission in case of unexpected failure
- Error emails, including CPU/MEM statistics
- Optional web-interface to monitor jobs
- Let's you easily switch between local multiprocessing and cluster computing"
- DRMAA XMLRPC: XML RPC Protocol Wrapper Around DRMAA Calls
- SunGrid Graphical Accounting Engine
- UBMoD: Collecting Statistical Data of Grid Engine Jobs
- Statistics with R-Project, Grid Engine, and Open MPI (2012-07-13)
- Job Dependency Visualization with dot
- Eclipse PTP
- qstat in your browser: xml-qstat
- BLCR Check Pointing Grid Engine Integration Scripts
- Cloudera Slides about Univa Grid Engine / UGE Hadoop Integration
- Video about Unisight 2.0: A nice interface for Grid Engine job statistics
- Video: Grid Engine in Amazon EC2 in 10 Minutes
- Flex-grid (qlicserver)