Univa Grid Engine 8.1 Enhancement (Part 6) - Configure the Parallel Environment (PE) Selection Order (2012-06-09)
This enhancement was already implemented for the patch release 8.0.1p5. But since it wasn‘t in the 8.0.1 FCS I denote it here as an 8.1 enhancement (the real truth is that I don‘t want to mix up 8.1 and 8.0.1 features in my blog order...;-).
One common problem Grid Engine administrators are facing is that they have parallel applications which only run with a specific amount of slots (ranks / cores) per host optimally. Let‘s assume that you have one which can deal with 64 slots per host as well as with 32 slots per host with a preference for 64 slots per host. How could this now be configured in Grid Engine?
For parallel jobs (i.e. jobs which need more than one slot / core) you need to configure a so called parallel environment. This parallel environment can then be selected during job submission time with the qsub -pe <your_PE> <total_amount_of_slots_you_need> switch. If you want a fixed amount of slots per host you have to define this in the PE config (qconf -mp <yourpe> or for creating a new one qconf -ap <yourpe>) in the allocation_rule section. Here just enter the amount of slots you need per host, like 64. When requesting this PE the total amount of slots must be a multiple of 64. Then the Grid Engine scheduler tries to find hosts which offers enough resources for 64 slots. But if there aren‘t enough then the job stays waiting in the queue (qw). But now you want that the scheduler tries to place the job with using just 32 slots per host. This is usually done by creating a second PE with a fixed allocation rule of 32. Let‘s assume you named them my_pe_064 and my_pe_032 respectively. Then all what you have to do is choosing the PEs during submission time with the wildcard PE selection: qsub -pe my_pe_\* ...
But now the scheduler chooses (for the admin and user) an arbitrary one, i.e. it could be that the jobs runs with 32 slots per host while it was possible that the jobs could also use 64 slots per host!!!
In order to determine the PE selection order in the scheduler, Univa Grid Engine comes with an scheduler parameter which allows you to configure that it should be either in (alphabetically) ASCENDING or DESCENDING (or like before with NONE) order depending on your naming schema of your PEs.
In our example all what you have to do is adding PE_SORT_ORDER=DESCENDING in the scheduler config (section: params) by qconf -msconf . Then the scheduler first checks if the 64 slots per host PE can be fulfilled if not it tries the 32 slots PE, guaranteed. That's it!
And by the way: The scheduler never mixes different PEs for one single job (i.e. in our example that one job never has 32 slots on one host while on another it gets 64 slots at the same time; it has always has either 64 slots or 32 slots all hosts but now with a preference for 64 slots).