Grid Engine Complex Configuration: The thing with FORCED resources (2012-09-20)

Due to the tremendous amount of configuration options in Grid Engine one complex configuration parameter is often overlooked: FORCED complexes.

Grid Engine handles resources in the complex configuration where each complex represents a resource. When the configuration is opened with qconf -mc, different columns are displayed for each resource. There is a name and a shortcut (both can be used for requesting this resource) as well as if it is a consumable (YES, NO, JOB) and so on. A resource can also be requestable: YES, which is for self-defined complexes most likely the default case. But additionally you can also set a resource to be FORCED requestable.

What this means is following: Wherever you initialize a complex (on queue level, queue instance level, host level, or global level) in the complex_values field with such a FORCED resource, those entities are not just selected by the scheduler for being valid ones, those entities allow only jobs with such resource requests to run. In other words: Jobs which are not requesting this forced complex are no able to run on such hosts or queue instances (wherever you initialized to complex).

Following example demonstrates the use of a FORCED complex:

Let‘s say you have different users, some are using core binding, some not (yes, I love this topic ;). You can argue that it could lead to an unfair situation on hosts where jobs are pinned to cores while others (bad jobs which create more processes than slots they got granted by the scheduler) are allowed to run on all cores. Such a situation can be relaxed in the following way: Create a complex "bound_job" which is just a BOOL (non-consumable). But this complex has to be set to FORCED in the "requestable" column (not to YES). Then set this "bound_job=1" resource to some hosts of your cluster (qconf -me -> complex_values or qconf -aattr exechost complex_values bound_job=1 <hostname>), which should be used exclusively for core bound jobs. When a user wants to submit a job with core binding he only has to add -l bound_job=1 additionally to the core binding request of the job. (If you don‘t trust the users, you can also create a JSV script doing this exclusively for jobs requesting core binding.)

What you get is following: Only jobs with core binding are now running on the hosts which have bound_job=true in the complex_values configuration (qconf -me <hostname>). Jobs without requesting core binding (i.e. without the bound_job request) are not allowed to run on those hosts. If you want to let core bound jobs also run on shared hosts, you only need to add a -soft in front of the -l bound_job=1 request.