Using Open Containers with runc in a Univa Grid Engine Compute Cluster (2015-06-28)
runc is a tool written in Go which is creating and starting up a Linux container according to the OCF specification. Its source code repository can be found here.
If you have a Go development environment then building it is very simple - just follow the instructions in the README (they are using a Makefile which internally calls godep, the standard tool for handling package dependencies in Go / probably you need to install it as well).
After installing the single runc binary you are able to startup containers right on the command line by pointing runc to a JSON description of the container. The container itself obviously also needs to be on the file system in order to chroot to it (which is done by runc). One major difference to Docker itself is that it does not do any kind of image management, but probably this is not required in case you have a good shared filesystem.
How to use runc in Univa Grid Engine
After runc is verified to run on command line it is time to use it under the control of Univa Grid Engine in order to exploit your compute clusters resources.
The integration can be very straight forward depended in what you want to achieve. I keep it here as simple as possible.
First of all you want to submit the container described by the Open Container Format (OCF) as JSON description to Univa Grid Engine and probably also use the resource management system of Grid Engine for handling cgroups and other limitations. This is possible since all container processes are children of runc - no daemon here is in play.
In order to setup running runc you can override the starter_method in the Univa Grid Engine queue configuration (qconf -mq all.q for example). The starter method is executed on the compute node in oder to start up the job. Unfortunately runc requires root privileges and the starter method is started as user. Therefore a sudo is required. Note that running a privileged process is always a security risk, but I’m always fearless on my private cluster on my laptop!!!
Point the starter_method (in the queue configuration like qconf -mq all.q) to the path where you have following script:
#!/bin/sh
sudo /usr/local/bin/runc --id "$JOB_ID" $@
Depending of your use case you need to allow that runc can run as root without requiring a password. This would be required in any case when running in batch mode.
Example using visudo:
daniel ALL=(ALL) NOPASSWD: /usr/local/bin/runc
You can also use user groups instead of specifying users.
A quick check:
$ sudo runc
JSON specification file for container.json not found
Switching to a directory where I have a container.json :
$ sudo runc
/ $ exit
Now, lets submit an interactive job. The pty switch is required when submitting a JSON file but requiring a shell.
I’m using the busy box image which is the example of runc README on github.
$ ls
busybox.tar container.json rootfs runc
Now I want to run the container on my cluster using a my shared filesystem.
$ qrsh -pty y -cwd container.json
/ $ ps -ef
PID USER COMMAND
1 daemon sh
7 daemon ps -ef
/ $ ls -lsiah
total 56
1999621 4 drwxr-xr-x 17 default default 4.0K Jun 23 05:41 .
1999621 4 drwxr-xr-x 17 default default 4.0K Jun 23 05:41 ..
1992545 0 -rwxr-xr-x 1 default default 0 Jun 27 15:31 .dockerenv
1992546 0 -rwxr-xr-x 1 default default 0 Jun 27 15:31 .dockerinit
2007813 4 drwxr-xr-x 2 default default 4.0K May 22 2014 bin
846097 0 drwxr-xr-x 5 root root 360 Jun 28 06:42 dev
2024173 4 drwxr-xr-x 6 default default 4.0K Jun 27 15:31 etc
2024184 4 drwxr-xr-x 4 default default 4.0K May 22 2014 home
2024187 4 drwxr-xr-x 2 default default 4.0K May 22 2014 lib
1992672 0 lrwxrwxrwx 1 default default 3 May 22 2014 lib64 -> lib
1992673 0 lrwxrwxrwx 1 default default 11 May 22 2014 linuxrc -> bin/busybox
…
/ $ exit
The PTY request is required otherwise we don’t get the interactive shell together with the command (in our case it is the JSON file). The -cwd argument specifies that the runc is executed in the current directory which removes the need for specifying the full path to the JSON file.
Now we want to run a batch job using this busy box container. Let’s assume our task is executing whoami in the container. You need to create a new JSON file with a different processes section:
5 "processes": [
6 {
7 "tty": false,
8 "user": "root",
9 "args": [
10 "whoami"
11 ],
12 "env": [
13 "PATH=/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin",
14 "TERM=xterm"
15 ],
16 "cwd": ""
17 }
So, tty is set to false. Saving that copy as run.json execution works like following:
$ qsub -cwd -b y ./run.json
Your job 3000000178 ("run.json") has been submitted
$ cat run.json.o3000000178
root
This is just a start. There are many features you can exploit.
This approach works well with Univa Grid Engine's cgroups integration. You can specify the amount of cores allocated for cpuset, main memory limit, and virtual memory limit for example:
$ qsub -binding linear:1 -l m_mem_free=2G,h_vmem=3G -cwd -b y ./run.json
Now the container is under complete control of Univa Grid Engine’s cgroups when it comes to cpu usage and memory usage.
Other possibilities include checkpointing integration with criu. You also need to align requests for Grid Engine with the JSON file which can be done certainly with a JSV script. Also with sudoers file you can limit which containers are allowed to be executed for which user or user group. This can be translated easily to Univa Grid Engine’s job class functionality making usage of containers save and very simple for the user.
Daniel