Univa Grid Engine 8.1.4 Released (2013-03-20)

Some weeks ago we released the Univa Grid Engine (UGE) 8.0.1p16 maintenance release for our 8.0.1 users and today we are happy to ship the next version of our 8.1 branch: 8.1.4. Overall it has about 50 more issues solved since 8.1.3.

As usual it comes with fixes and smaller enhancements. In particular our Intel Xeon Phi support was updated. The documentation contains now detailed example scripts about how to start MIC native binaries directly on the Intel Xeon Phi boards. An additional helper tool (mic_load_check) which outputs load values of the co-processor board is shipped as well. It shares the same code base as our Intel Xeon Phi load sensor, which was introduced in UGE 8.1.3. The loadcheck utility (located in /utilbin/) was enhanced with capabilities in order to transform between different processor ID representations (logical, OS internal, socket/core pairs), so that the selected CPU cores in the $SGE_BINDING job environment variable can now be easily translated into the CPU ID representation your parallel application needs.

Several improvements for memory management also did it into this release. It is now possible to lower the m_mem_free memory a job got granted by UGE during job run-time with qalter -l m_mem_free. In NUMA system it automatically adapts also the free memory on a particular NUMA node (m_mem_free_n), depending how the memory is allocated by the job (interleaved, node local, …). JSV can now modify the mbind submission parameter. The interactive qrsh supports the -mbind parameter as well. Qmon was enhanced so that it can now also create RSMAP consumable complexes. UGE 8.1.4 can now report more detailed memory values like the proportional segment size (pss), rss, smem and pmem with newer Linux kernels. This ensures a more accurate memory usage reporting.

Performance enhancements were also implemented: Qmaster sometimes moved too many jobs from one scheduling round to the next under some circumstances. This was improved so that the overall cluster utilization could be higher especially in bigger clusters. The scheduler is now able to stop the scheduling run after a pre-configured time limit is reached within the scheduling loop or after a specific amount of jobs could be dispatched. This can be enabled by the new scheduler parameters MAX_SCHEDULING_TIME and MAX_DISPATCHED_JOBS.

For the other issues handled within this release please have a look at the list of fixes in the release notes.