Open Cluster Scheduler: The Future of Open Source Workload Management (2024-06-10)

See also our announcement at HPC Gridware

Dear Community,

We are thrilled to announce that the source code repository for the Open Cluster Scheduler is now officially open-sourced and available at github.com/hpc-gridware/clusterscheduler.

The Open Cluster Scheduler is the cutting-edge successor to renowned open-source workload management systems such as "Sun Grid Engine", "Univa Grid Engine Open Core", "Son of Grid Engine," and others. With a development history spanning over three decades, its origins can be traced back to the Distributed Queueing System (DQS), and it achieved widespread adoption under the name "Sun Grid Engine".

A Solution for the AI Era

As the world pivots towards artificial intelligence and high-performance computing, the necessity for an efficient and open-source cluster scheduler has never been more urgent. In today's GPU cluster environments, harnessing full hardware utilization is not only economically beneficial but also accelerates results, enables more inference tasks per hour, and facilitates the creation of more intricate AI models.

Why Open Cluster Scheduler?

There is a real gap in the market for open-source workload managers, and Open Cluster Scheduler is here to fill it with a whole host of remarkable features:

  • Dynamic, On-Demand Cluster Configuration: Make changes without the need to restart services or daemons.
  • Standard-Compliant Interfaces and APIs: Enjoy compatibility with standard command-line interfaces (qsub, qstat, …) and standard APIs like DRMAA.
  • High Throughput: Efficiently handle millions of independent compute jobs daily.
  • Mixed Job Support: Run large MPI jobs alongside small, short single-node tasks seamlessly without altering configurations.
  • Rapid Submission: Submit thousands of different jobs within seconds.
  • High Availability: Ensure reliability and continuous operation.

Optimized for Performance

Open Cluster Scheduler is meticulously optimized across all dimensions:

  • Binary Protocol Between Daemons: Enhances communication efficiency.
  • Multi-threaded Scheduler: Ensures optimal performance.
  • Written in C++/C: Delivers robust and high-speed computing.
  • Multi-OS and Architecture Support: Compatible with architectures including AMD64, ARM64, RISC-V, and more.

Looking Forward

We are committed to evolving Open Cluster Scheduler into a modern solution that will be capable of managing highly demanding compute workloads across diverse computational environments, whether on-premises or in the cloud.

We invite you to explore, contribute, and join us in this exciting new chapter. Together, we can shape the future of high-performance computing.

Visit our repository: github.com/hpc-gridware/clusterscheduler

Thank you for your continued support and enthusiasm.

Sincerely,
Daniel, Ernst, Joachim