Running Simple PyTorch Jobs on Gridware Cluster Scheduler: Generating 1000 Cat Images (2025-01-26)
The most valuable digital assets in human history are undoubtedly cat pictures :). While they once dominated the web, today they have become a focal point for AI-generated imagery. Therefore, why not demonstrate using the Gridware Cluster Scheduler by generating 1000 high-resolution cat images?
Key components include:
- Gridware Cluster Scheduler installation
- Optional use of Open Cluster Scheduler
- Python
- PyTorch
- Stable Diffusion
- GPUs
System Setup
After installing the Gridware Cluster Scheduler and enabling GPU support according to the Admin Guide, NVIDIA_GPUS resources become automatically available. The GPU integration sets NVIDIA-related environment variables as selected by the scheduler for each job, and ensuring accurate per-job GPU accounting.
When using the free Open Cluster Scheduler, additional manual configuration is necessary (see this blog post).
In this example, I assume PyTorch and the required Python libraries are available on the nodes. To minimize machine dependencies, containers can be used, although this is not the focus of the blog. HPC-compatible container runtime environments such as Apptainer (formerly Singularity) and Podman can be used with the Gridware Cluster Scheduler out-of-the-box.
Submitting Cat Image Creation Jobs
To create cat images, we need input prompts, lots of them. They can be stored in an array like this:
prompts = [
"A playful cat stretches its ears while rubbing against soft fur...",
"A curious cat leaps gracefully to catch a tiny mouse...",
# ... (other prompts)
]
To generate many prompts, we can automate their creation:
import random
def generate_pro_prompts():
cameras = [
"Canon EOS R5", "Sony α1", "Nikon Z9", "Phase One XT",
"Hasselblad X2D", "Fujifilm GFX100 II", "Leica M11",
"Pentax 645Z", "Panasonic S1R", "Olympus OM-1"
]
lenses = [
"85mm f/1.2", "24-70mm f/2.8", "100mm Macro f/2.8",
"400mm f/2.8", "50mm f/1.0", "12-24mm f/4",
"135mm f/1.8", "Tilt-Shift 24mm f/3.5", "8-15mm Fisheye",
"70-200mm f/2.8"
]
styles = [
"award-winning wildlife", "editorial cover", "cinematic still",
"fine art gallery", "commercial product", "documentary",
"fashion editorial", "scientific macro", "sports action",
"architectural interior"
]
lighting = [
"golden hour backlight", "softbox Rembrandt", "dappled forest",
"blue hour ambient", "studio butterfly", "silhouette contrast",
"LED ring light", "candlelit warm", "neon urban", "moonlit"
]
cats = [
"Maine Coon", "Siamese", "Bengal", "Sphynx", "Ragdoll",
"British Shorthair", "Abyssinian", "Persian", "Scottish Fold",
"Norwegian Forest"
]
actions = [
"mid-leap", "grooming", "playing", "sleeping", "stretching",
"climbing", "hunting", "yawning", "curious gaze", "pouncing"
]
# Generate random combinations
prompts = []
for _ in range(1000):
style = random.choice(styles)
cat = random.choice(cats)
action = random.choice(actions)
camera = random.choice(cameras)
lens = random.choice(lenses)
light = random.choice(lighting)
aperture = round(1.4 + random.random() * 7.6, 1) # Random f-stop between f/1.4 and f/9
shutter_speed = 1 / random.randint(1, 4000) # Random shutter speed
iso = random.choice([100, 200, 400, 800, 1600, 3200, 6400]) # Random ISO
prompt = (f"{style} photo of {cat} cat {action} | "
f"{camera} with {lens} | {light} lighting | "
f"f/{aperture} {shutter_speed:.0f}s ISO {iso} | "
f"Technical excellence award composition")
prompts.append(prompt)
return prompts
Once we have an array of prompts, we can divide them into chunks so that each batch job generates more than one image. This approach can be further optimized later (not part of this article).
The critical aspect here is job submission. The prompts are supplied to the job as environment variables using the -v switch. The -q switch assigns the gpu.q to the job, which is assumed to be configured across the GPU nodes. The -l switch selects 1 GPU device per job, ensuring the GPU integration sets the appropriate NVIDIA environment variables so that jobs don’t conflict. This is accomplished through the new qgpu utility called in the gpu.q's prolog. For the Open Cluster Scheduler, you need to configure this manually. That means configuring the NVIDIA_GPUS resource as RSMAP with the GPU ID range, while the job itself must convert the SGE_HGR_NVIDIA_GPUS environment variable set in the job to the NVIDIA environment variables (see this blog post).
The job itself, executed on the compute node, is the python3 script located on the shared storage.
Here's the submit.py script:
def main():
prompts = generate_pro_prompts()
parser = argparse.ArgumentParser()
parser.add_argument('--chunk-size', type=int, required=True,
help='Number of prompts per job')
args = parser.parse_args()
# Split prompts into chunks
chunks = [prompts[i:i+args.chunk_size]
for i in range(0, len(prompts), args.chunk_size)]
# Submit jobs
for i, chunk in enumerate(chunks):
try:
# Serialize chunk to JSON for safe transmission
prompts_json = json.dumps(chunk)
subprocess.run(
[
"qsub", "-j", "y", "-b", "y",
"-v", f"INPUT_PROMPT={prompts_json}",
"-q", "gpu.q",
"-l", "NVIDIA_GPUS=1",
"python3", "/home/nvidia/genai1/run.py"
],
check=True
)
print(f"Submitted job {i+1} with {len(chunk)} prompts")
except subprocess.CalledProcessError as e:
print(f"Failed to submit job {i+1}. Error: {e}")
if __name__ == "__main__":
main()
The run.py script carries out the intensive process using the cat-related prompts. Note, there is a lot of potential for obvious improvements - which is not the focus of the article. It retrieves prompts from the INPUT_PROMPT environment variable together with the unique JOB_ID assigned by the Gridware Cluster Scheduler, exactly like in SGE. The images are stored in the job's working directory, which is assumed to be shared. Learn more about the DiffusionPipeline utilizing Stable Diffusion at HuggingFace.
import logging
import os
import json
from diffusers import DiffusionPipeline
import torch
from PIL import Image
logging.basicConfig(level=logging.INFO, format='%(asctime)s - %(levelname)s - %(message)s')
def generate_image(prompt: str, output_path: str):
try:
logging.info("Loading model...")
pipe = DiffusionPipeline.from_pretrained(
"stabilityai/stable-diffusion-xl-base-1.0",
torch_dtype=torch.float16,
use_safetensors=True,
variant="fp16"
)
pipe.to("cuda")
except Exception as e:
logging.error("Pipeline init error: %s", str(e))
return
try:
logging.info("Generating: %s", prompt)
image = pipe(prompt=prompt).images[0]
image.save(output_path)
logging.info("Saved: %s", output_path)
except Exception as e:
logging.error("Generation error: %s", str(e))
if __name__ == "__main__":
# Get job context
job_id = os.getenv("JOB_ID", "unknown_job")
# Parse prompts from JSON
try:
prompt_list = json.loads(os.getenv("INPUT_PROMPT", "[]"))
except json.JSONDecodeError:
logging.error("Invalid prompt format!")
prompt_list = []
# Process all prompts in chunk
for idx, prompt in enumerate(prompt_list):
if not prompt.strip():
continue # skip empty prompts
output_path = f"image_{job_id}_{idx}.png"
generate_image(prompt, output_path)
To submit the AI inference jobs to the system, simply execute:
python3 submit.py --chunk-size 5
This results in 200 queued jobs, each capable of creating 5 images.
Supervising AI Job Execution
When correctly configured, the Gridware Cluster Scheduler executes jobs on available GPUs at the appropriate time. You can check the status with qstat
or get job-related information with qstat -j <jobid>
. After a while, you will have your 1000 cat images. Once completed, you can also view per-job GPU usage in qacct -j <jobid>
, with metrics like nvidia_energy_consumed
and nvidia_power_usage_avg
, as well as the submission command line with the prompts, for example:
qacct -j 2900
==============================================================
qname gpu.q
hostname XXXXXX.server.com
group nvidia
owner nvidia
project NONE
department defaultdepartment
jobname python3
jobnumber 2900
taskid undefined
pe_taskid NONE
account sge
priority 0
qsub_time 2025-01-26 11:02:12.650259
submit_cmd_line qsub -j y -b y -v 'INPUT_PROMPT=["award-winning wildlife photo of Maine Coon cat mid-leap | Canon EOS R5 with 85mm f/1.2 | golden hour backlight lighting | f/2.4 1/1500s ISO 300 | Technical excellence award composition", "editorial cover photo of Siamese cat grooming | Sony u03b11 with 24-70mm f/2.8 | softbox Rembrandt lighting | f/2.9 1/2000s ISO 400 | Technical excellence award composition", "cinematic still photo of Bengal cat playing | Nikon Z9 with 100mm Macro f/2.8 | dappled forest lighting | f/3.4 1/2500s ISO 500 | Technical excellence award composition", "fine art gallery photo of Sphynx cat sleeping | Phase One XT with 400mm f/2.8 | blue hour ambient lighting | f/3.9 1/3000s ISO 600 | Technical excellence award composition", "commercial product photo of Ragdoll cat stretching | Hasselblad X2D with 50mm f/1.0 | studio butterfly lighting | f/4.4 1/3500s ISO 100 | Technical excellence award composition"]' -q gpu.q -l NVIDIA_GPUS=1 python3 /home/nvidia/genai1/run.py
start_time 2025-01-26 13:13:06.172316
end_time 2025-01-26 13:15:19.362819
granted_pe NONE
slots 1
failed 0
exit_status 0
ru_wallclock 133
ru_utime 135.470
ru_stime 1.319
ru_maxrss 7142080
ru_ixrss 0
ru_ismrss 0
ru_idrss 0
ru_isrss 0
ru_minflt 74728
ru_majflt 0
ru_nswap 0
ru_inblock 0
ru_oublock 14208
ru_msgsnd 0
ru_msgrcv 0
ru_nsignals 0
ru_nvcsw 2488
ru_nivcsw 846
wallclock 134.238
cpu 136.789
mem 6587.407
io 0.096
iow 0.000
maxvmem 61648994304
maxrss 7112032256
arid undefined
nvidia_energy_consumed 26058.000
nvidia_power_usage_avg 158.000
nvidia_power_usage_max 158.000
nvidia_power_usage_min 0.000
nvidia_max_gpu_memory_used 0.000
nvidia_sm_clock_avg 1980.000
nvidia_sm_clock_max 1980.000
nvidia_sm_clock_min 1980.000
nvidia_mem_clock_avg 2619.000
nvidia_mem_clock_max 2619.000
nvidia_mem_clock_min 2619.000
nvidia_sm_utilization_avg 0.000
nvidia_sm_utilization_max 0.000
nvidia_sm_utilization_min 0.000
nvidia_mem_utilization_avg 0.000
nvidia_mem_utilization_max 0.000
nvidia_mem_utilization_min 0.000
nvidia_pcie_rx_bandwidth_avg 0.000
nvidia_pcie_rx_bandwidth_max 0.000
nvidia_pcie_rx_bandwidth_min 0.000
nvidia_pcie_tx_bandwidth_avg 0.000
nvidia_pcie_tx_bandwidth_max 0.000
nvidia_pcie_tx_bandwidth_min 0.000
nvidia_single_bit_ecc_count 0.000
nvidia_double_bit_ecc_count 0.000
nvidia_pcie_replay_warning_count 0.000
nvidia_critical_xid_errors 0.000
nvidia_slowdown_thermal_count 0.000
nvidia_slowdown_power_count 0.000
nvidia_slowdown_sync_boost 0.000
This example demonstrates how easily you can utilize the Gridware Cluster Scheduler to keep your GPUs engaged continuously, regardless of the frameworks, models, or input data your cluster users employ for single-GPU jobs, multi-GPU jobs, or multi-node multi-GPU jobs using MPI. Using containers or just using applications available on the command line.
Below you can find some output examples...enjoy :)




