Grid Engine Specific News

Open Cluster Scheduler: The Future of Open Source Workload Management (2024-06-10)

See also our announcement at HPC Gridware

Dear Community,

We are thrilled to announce that the source code repository for the Open Cluster Scheduler is now officially open-sourced and available at github.com/hpc-gridware/clusterscheduler.

The Open Cluster Scheduler is the cutting-edge successor to renowned open-source workload management systems such as "Sun Grid Engine", "Univa Grid Engine Open Core", "Son of Grid Engine," and others. With a development history spanning over three decades, its origins can be traced back to the Distributed Queueing System (DQS), and it achieved widespread adoption under the name "Sun Grid Engine".

A Solution for the AI Era

As the world pivots towards artificial intelligence and high-performance computing, the necessity for an efficient and open-source cluster scheduler has never been more urgent. In today's GPU cluster environments, harnessing full hardware utilization is not only economically beneficial but also accelerates results, enables more inference tasks per hour, and facilitates the creation of more intricate AI models.

Why Open Cluster Scheduler?

There is a real gap in the market for open-source workload managers, and Open Cluster Scheduler is here to fill it with a whole host of remarkable features:

Dynamic, On-Demand Cluster Configuration: Make changes without the need to restart services or daemons.
Standard-Compliant Interfaces and APIs: Enjoy compatibility with standard command-line interfaces (qsub, qstat, …) and standard APIs like DRMAA.
High Throughput: Efficiently handle millions of independent compute jobs daily.
Mixed Job Support: Run large MPI jobs alongside small, short single-node tasks seamlessly without altering configurations.
Rapid Submission: Submit thousands of different jobs within seconds.
High Availability: Ensure reliability and continuous operation.

Optimized for Performance

Open Cluster Scheduler is meticulously optimized across all dimensions:

Binary Protocol Between Daemons: Enhances communication efficiency.
Multi-threaded Scheduler: Ensures optimal performance.
Written in C++/C: Delivers robust and high-speed computing.
Multi-OS and Architecture Support: Compatible with architectures including AMD64, ARM64, RISC-V, and more.

Looking Forward

We are committed to evolving Open Cluster Scheduler into a modern solution that will be capable of managing highly demanding compute workloads across diverse computational environments, whether on-premises or in the cloud.

We invite you to explore, contribute, and join us in this exciting new chapter. Together, we can shape the future of high-performance computing.

Visit our repository: github.com/hpc-gridware/clusterscheduler

Thank you for your continued support and enthusiasm.

Sincerely,
Daniel, Ernst, Joachim

UberCloud Releases Multi-Cloud, Hybrid-Cloud HPC Application Platform (2020/11/06)

The way enterprises run High Performance Computing (HPC) applications has changed. With Cloud providers offering improved security, better cost/performance, and seemingly endless compute capacity, more enterprises are turning to Cloud for their HPC workloads.

However, many companies are finding that replicating an existing on-premise HPC architecture in the Cloud does not lead to the desired breakthrough improvements. With this in mind, from day one, the UberCloud HPC Application Platform has been built with cloud computing in mind, resulting in highly increased productivity of the HPC engineers, significantly improving IT security, reducing cloud costs and administrative overhead to a minimum, and maintaining full control for engineers and corporate IT over their HPC cloud environment. Today, we are announcing UberCloud’s next-generation HPC Application Platform.

Building blocks of the UberCloud Platform, including HPC, Cloud, Containers, and Kubernetes, have been previously discussed on HPCwire: Kubernetes, Containers and HPC, and Kubernetes and HPC Applications in Hybrid Cloud Environments.

Key Stakeholders when Driving HPC Cloud Adoption

When we started designing the UberCloud HPC Application platform we recognized that three major stakeholders are crucial for the overall success of a company’s HPC cloud journey: HPC engineers, Enterprise IT, and the HPC IT team.

HPC application engineers are the driving force behind innovation. To excel in (and enjoy) their job they require a frictionless, self-service user portal for allocating the computational resources when they are required. They don’t necessarily need to understand how compute nodes, GPUs, storage, or fast network interconnects have to be configured. They expect to be able to allocate and shutdown fully configured HPC application environments.

Enterprise IT demands the necessary software tools and pre-configured containerized HPC applications for creating fully automated, completely tested environments. These environments must be suited to interact with HPC applications and their special requirements for resources and license servers. The platform needs to be pluggable to modern IT environments and support technologies like CI/CD pipelines and Kubernetes orchestration.

The HPC IT team (often quite independent from Enterprise IT) requires a hybrid cloud strategy for enhancing their existing on-premise HPC infrastructure with cloud resources for bursting and hybrid cloud scenarios. This team demands control on software versions and puts emphasis on the entire engineering lifecycle, from design to manufacturing.

Introducing the UberCloud HPC Application Platform

The UberCloud HPC Application Platform aims at supporting each of the three major key stakeholders during their HPC cloud adoption journey. How is that achieved?

For the HPC application engineers UberCloud provides a self-service HPC user interface where they select their application(s) along with the hardware parameters they need. With a single click the fully automated UberCloud HPC Application Platform allocates the dedicated computing infrastructure, deploys the application, and configures access for the engineer for instant productivity. Similarly, the HPC application infrastructure can be resized at any given point in time to run distributed memory simulations, parameter studies, or a design of experiments. After work is done the application and the simulation platform can be safely shut down.

Enterprise IT operations often have their own way of managing cloud-based resources. Infrastructure as Code, GitOps, and DevOps are some of the paradigms found in those organizations. The UberCloud HPC Application Platform contains a management tool which can be integrated in any kind of automation or CI/CD pipeline tool chains. UberCloud’s application platform management tool takes care of all aspects of managing containerized HPC applications using Kubernetes based container orchestrators like GKE, AKS, and EKS.

HPC IT teams require integration points for allocating cloud resources and distributing HPC jobs between their on-premise HPC clusters and dynamically allocated cloud resources. The UberCloud HPC Cloud Dispatcher provides batch job interfaces for hybrid cloud, cloud bursting, and high-throughput computing. It relies on open standards through the whole application stack to provide stable integration interfaces.

Putting UberCloud’s HPC Application Platform into Practice

Our first customer that enjoyed the benefits of the UberCloud HPC Application Platform is FLSmidth, a Danish multinational engineering company providing global cement and mineral industries with factories, machinery, services and know-how. The Proof of Concept implementation at the end of last year has been recently summarized here, and the extended case study (including a description of the hybrid cloud architecture) is freely available through This e-mail address is being protected from spambots. You need JavaScript enabled to view it. .

UberCloud Webinar with Microsoft (2020/10/26)

It is never to late to consider moving HPC workload to the cloud. Registration for this weeks webinar is still open!

Check out Wolfgang's awesome video message about the key values of UberCloud's HPC Application Platform.

https://www.linkedin.com/feed/update/urn:li:activity:6725420905389944832/

Hybrid Cloud interactive HPC Applications on Kubernetes (2020-03-19)

We just published a follow up article about our experiences at UberCloud running HPC Engineering Applications on different Clouds and on-premises using Kubernetes as middleware-stack.

https://www.hpcwire.com/2020/03/19/kubernetes-and-hpc-applications-in-hybrid-cloud-environments-part-ii/

Univa Grid Engine 8.4.1 Released (2016-07-19)

Univa Grid Engine 8.4.1 was released by Univa’s engineering team last week. This is the first update for the major 8.4 release which brings in main fixes as well as new capabilities.

The major feature introduced in 8.4 is native Docker support. Native Docker support of Univa Grid Engine means that Docker enabled hosts are automatically discovered (without any configuration needed by the admin) and the installed images are forwarded to the global scheduler of Univa Grid Engine. The user just needs to select a image name and then its job script or binary is executed on an selected execution host within the automatically created Docker container. Since the Docker containers are not anymore child processes of UGE and therefore not directly supervised by Univa Grid Engine (for accounting and resource limitation for example) a new process is injected in the container which supervises the job within the container as Univa Grid Engine does it for jobs outside of a container. Working directories and job spooling directories are automatically mounted inside the container.

Univa Grid Engine 8.4.1 adds now capabilities to launch containers with all the parameters you know from Docker run with a new qsub parameter called -xd (to select ports or devices for example).

Univa Grid Engine is a proven and a highly scalable (thousands of compute nodes) solution not just for traditional compute jobs, now also for Docker containers, which comes with enterprise features like accounting, reporting, access control, fair usage of compute resources through a set of policies which controls job priorities, simplification of administration and job submission through job classes, APIs, and quotas on container usage.

Nav view search

Navigation

Search