Skip to content

feat: cpu jobs skip gpu partitions#454

Open
GaspardR wants to merge 1 commit intosnakemake:mainfrom
GaspardR:feat-cpu-jobs-skip-gpu-partitions
Open

feat: cpu jobs skip gpu partitions#454
GaspardR wants to merge 1 commit intosnakemake:mainfrom
GaspardR:feat-cpu-jobs-skip-gpu-partitions

Conversation

@GaspardR
Copy link
Copy Markdown

@GaspardR GaspardR commented Apr 22, 2026

fix: #446

When jobs are submitted with the automatic selection from the config file, the partition with the greatest score for that job is used. This can be a waste of compute resources when gpu partitions are selected for cpu-only jobs.
Here, gpu partitions are given a penalty for cpu-only jobs.

I slightly changed the basic_partition_config to take in account the penalty in existent tests. Let me know if it's preferable to create a completely new test.

Summary by CodeRabbit

  • Bug Fixes

    • Improved partition selection logic to evaluate all scored partitions as valid candidates.
    • Enhanced GPU job scheduling to better assess partition suitability for jobs without GPU requirements.
  • Tests

    • Updated test configuration with additional resource limit constraints.

fix: snakemake#446

When jobs are submitted with the automatic selection from
the config file, the partition with the greatest score for
that job is used.
This can be a waste of compute resources when gpu partitions
are selected for cpu-only jobs.
Here, gpu partitions are given a penalty for cpu-only jobs.
@GaspardR GaspardR requested a review from cmeesters April 22, 2026 01:06
@coderabbitai
Copy link
Copy Markdown
Contributor

coderabbitai Bot commented Apr 22, 2026

Walkthrough

Modified GPU partition scoring logic to penalize GPU-equipped partitions when jobs request zero GPUs, and updated partition selection to consider all non-null scores including zero and negative values. This enables better allocation of CPU-only jobs to CPU-optimized partitions.

Changes

Cohort / File(s) Summary
GPU Partition Scoring
snakemake_executor_plugin_slurm/partitions.py
Refactored score_job_fit to apply -1 penalty to GPU partitions when jobs request 0 GPUs. Updated get_best_partition to evaluate all partitions with non-null scores, including zero and negative scores, rather than filtering to positive scores only.
Test Configuration
tests/test_partition_selection.py
Added max_cpus_per_task: 16 field to GPU partition test fixture configuration.

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~20 minutes

Poem

🐰 When CPUs call for work, no GPUs they need,
Yet GPU partitions still answer with greed.
Now -1 they score, pushed down in the queue,
CPU partitions shine—a much fairer view! 🎯

🚥 Pre-merge checks | ✅ 4 | ❌ 1

❌ Failed checks (1 warning)

Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 66.67% which is insufficient. The required threshold is 80.00%. Write docstrings for the functions missing them to satisfy the coverage threshold.
✅ Passed checks (4 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title check ✅ Passed The title 'feat: cpu jobs skip gpu partitions' clearly summarizes the main change: adjusting partition selection so CPU-only jobs avoid GPU partitions.
Linked Issues check ✅ Passed The PR implements the core requirement from issue #446: CPU jobs are now penalized when selecting GPU partitions by returning a -1 score penalty, preventing unnecessary GPU allocation.
Out of Scope Changes check ✅ Passed All changes are directly related to the stated objective of penalizing GPU partitions for CPU-only jobs, with no unrelated modifications detected.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 2

🧹 Nitpick comments (1)
tests/test_partition_selection.py (1)

29-29: Add a focused regression test for CPU jobs avoiding GPU partitions.

Changing the shared fixture makes the existing CPU selection test pass, but it does not directly cover the issue scenario: a GPU partition that would otherwise score better for a CPU-only job. A separate test would make this behavior explicit and prevent regressions.

🧪 Suggested test shape
+    def test_cpu_only_job_prefers_cpu_partition_over_better_scored_gpu_partition(
+        self, temp_yaml_file, mock_job, mock_logger
+    ):
+        config = {
+            "partitions": {
+                "cpu": {
+                    "max_runtime": 1440,
+                    "max_mem_mb": 512000,
+                    "max_threads": 128,
+                },
+                "gpu": {
+                    "max_runtime": 720,
+                    "max_mem_mb": 64000,
+                    "max_threads": 16,
+                    "max_gpu": 4,
+                },
+            }
+        }
+        temp_path = temp_yaml_file(config)
+
+        try:
+            partitions = read_partition_file(temp_path)
+            job = mock_job(threads=8, mem_mb=32000, runtime=60)
+
+            selected_partition = get_best_partition(partitions, job, mock_logger)
+
+            assert selected_partition == "cpu"
+        finally:
+            temp_path.unlink()
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@tests/test_partition_selection.py` at line 29, Add a focused regression test
named something like test_cpu_job_avoids_gpu_partition_regression in
tests/test_partition_selection.py that constructs two partitions (one
GPU-enabled partition that would otherwise score higher and one CPU-only
partition) and a CPU-only job request (gpus=0 or equivalent), calls the
partition selection function (select_partition / choose_partition or the module
function under test), and asserts the selected partition is the CPU-only
partition; use the same partition/job dataclass/fixture types as existing tests
(e.g., Partition, JobSpec) and explicitly set the GPU count on the job to zero
so the test fails if GPU partitions are chosen for CPU-only jobs.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@snakemake_executor_plugin_slurm/partitions.py`:
- Around line 267-268: The current logic appends penalized GPU partitions
(scored_partitions.append((p, score))) allowing negative scores to still make
GPU partitions eligible for CPU-only jobs; change this by detecting CPU-only
requests and filtering out GPU partitions entirely (or setting their score to
None) instead of keeping negative scores—modify the block around
scored_partitions.append((p, score)) (and the analogous handling at the other
site around lines 548-550) so that GPU partitions (identify via the partition
object used as p or its is_gpu flag/property) are excluded when the job requests
no GPUs, ensuring callers never receive GPU partitions for CPU-only jobs.
- Around line 548-550: The CPU-only penalty incorrectly uses gpu_count derived
without considering the 'gpus' resource, so parse 'gpus' as an alias for 'gpu'
wherever job GPU requirements are collected and used (e.g., the
function/variable that computes gpu_count for partition scoring and the
get_job_cpu_requirement helper); update the GPU-requirement extraction to check
both 'gpu' and 'gpus' (and existing 'gres'/'gpu_type' handling) and use that
unified GPU count when applying the "Disadvantage gpu partitions for cpu only
jobs" score adjustment so jobs that request resources: gpus=1 are not treated as
CPU-only.

---

Nitpick comments:
In `@tests/test_partition_selection.py`:
- Line 29: Add a focused regression test named something like
test_cpu_job_avoids_gpu_partition_regression in
tests/test_partition_selection.py that constructs two partitions (one
GPU-enabled partition that would otherwise score higher and one CPU-only
partition) and a CPU-only job request (gpus=0 or equivalent), calls the
partition selection function (select_partition / choose_partition or the module
function under test), and asserts the selected partition is the CPU-only
partition; use the same partition/job dataclass/fixture types as existing tests
(e.g., Partition, JobSpec) and explicitly set the GPU count on the job to zero
so the test fails if GPU partitions are chosen for CPU-only jobs.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

Run ID: eb3839a5-4ef6-4d90-9a46-385fa03d816b

📥 Commits

Reviewing files that changed from the base of the PR and between a09a027 and f04956c.

📒 Files selected for processing (2)
  • snakemake_executor_plugin_slurm/partitions.py
  • tests/test_partition_selection.py

Comment thread snakemake_executor_plugin_slurm/partitions.py
Comment on lines +548 to +550
if gpu_count == 0 and self.limits.max_gpu > 0:
# Disadvantage gpu partitions for cpu only jobs
score -= 1
Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai Bot Apr 22, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

Account for the supported gpus resource before applying the CPU-only penalty.

A job using resources: gpus=1 is currently parsed as gpu_count == 0, so it gets treated as CPU-only and GPU eligibility is not enforced. Include gpus wherever GPU requirements are parsed/scored. Based on learnings, PR #173 added GPU job support via gres, gpu/gpus, and GPU type resources.

🐛 Possible fix: parse `gpus` as an alias for `gpu`
 def parse_gpu_requirements(job: JobExecutorInterface) -> tuple[int, Optional[str]]:
     """Parse GPU requirements from job resources. Returns (count, model)"""
-    gpu_required = job.resources.get("gpu", 0)
+    gpu_required = job.resources.get("gpu", job.resources.get("gpus", 0))
     gres = job.resources.get("gres", "")

Also mirror the same alias in get_job_cpu_requirement:

-    gpu_required = job.resources.get("gpu", 0)
+    gpu_required = job.resources.get("gpu", job.resources.get("gpus", 0))
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@snakemake_executor_plugin_slurm/partitions.py` around lines 548 - 550, The
CPU-only penalty incorrectly uses gpu_count derived without considering the
'gpus' resource, so parse 'gpus' as an alias for 'gpu' wherever job GPU
requirements are collected and used (e.g., the function/variable that computes
gpu_count for partition scoring and the get_job_cpu_requirement helper); update
the GPU-requirement extraction to check both 'gpu' and 'gpus' (and existing
'gres'/'gpu_type' handling) and use that unified GPU count when applying the
"Disadvantage gpu partitions for cpu only jobs" score adjustment so jobs that
request resources: gpus=1 are not treated as CPU-only.

Copy link
Copy Markdown
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

gpus resource is not a standard resource in snakemake, so it is not supported.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@GaspardR, thanks for the clarification! You're right — since gpus is not a standard Snakemake resource name, the concern doesn't apply. I've noted this for future reviews.

I'd also like to flag that a previously stored learning (ID 937711e3-f289-4580-970e-35752bf03c69) states: "the GPU resource is specified using the 'gpus' (plural) resource name". Based on your clarification, this seems inaccurate or at least misleading — it likely refers to the SLURM CLI --gpus flag rather than a Snakemake resource name. Would you like me to remove that learning?

(̷ ˃̵͈ᴗ˂̵͈)੭ु⁾⁾


✏️ Learnings added
Learnt from: GaspardR
Repo: snakemake/snakemake-executor-plugin-slurm PR: 454
File: snakemake_executor_plugin_slurm/partitions.py:548-550
Timestamp: 2026-04-22T01:31:02.790Z
Learning: In snakemake-executor-plugin-slurm, `gpus` is NOT a standard Snakemake resource name. The supported Snakemake resource keys for GPU jobs are `gpu` (singular, numeric count), `gres` (generic SLURM resource string, e.g. 'gpu:1'), `gpu_model`, and `gpu_manufacturer`. The SLURM CLI flag `--gpus` (plural) is what SLURM itself uses, but that is distinct from the Snakemake resource name.

Note: Learnings are effective only in the context of similar code segments. To apply general coding guidelines, please configure review instructions. You can manage existing learnings in the app.


🧠 Learnings used
Learnt from: GaspardR
Repo: snakemake/snakemake-executor-plugin-slurm PR: 454
File: snakemake_executor_plugin_slurm/partitions.py:267-268
Timestamp: 2026-04-22T01:25:17.194Z
Learning: In snakemake_executor_plugin_slurm/partitions.py, do not hard-exclude GPU partitions when GPU count is unavailable. Instead, preserve the intended fallback behavior by applying a score penalty (e.g., via Partition.score_job_fit using a `-1` penalty when `gpu_count == 0` and `self.limits.max_gpu > 0`) so CPU-only partitions are preferred, but GPU partitions may still be selected when no non-GPU partition is available. Ensure get_best_partition considers all non-None scores (including negative values) when choosing the best partition.

Learnt from: johanneskoester
Repo: snakemake/snakemake-executor-plugin-slurm PR: 173
File: docs/further.md:96-96
Timestamp: 2025-03-10T15:20:51.829Z
Learning: PR `#173` in snakemake-executor-plugin-slurm implements GPU job support by adding resources: `gres` for generic resource specifications (e.g., 'gpu:1'), `gpu`/`gpus` for specifying GPU counts, and `gpu_model`/`gpu_manufacturer` for specifying GPU types, allowing users to request GPU resources directly rather than having to use slurm_extra.

Learnt from: cmeesters
Repo: snakemake/snakemake-executor-plugin-slurm PR: 173
File: tests/tests.py:79-86
Timestamp: 2025-03-10T13:34:59.270Z
Learning: In the Snakemake executor plugin for SLURM, all GPU resources (even simple numeric ones) should be translated to the `--gpus` flag (plural) to match SLURM's expectations, not `--gpu` (singular).

Learnt from: cmeesters
Repo: snakemake/snakemake-executor-plugin-slurm PR: 173
File: tests/tests.py:79-86
Timestamp: 2025-03-10T13:34:59.270Z
Learning: In the Snakemake executor plugin for SLURM, all GPU resources (both simple numeric values and model:number specifications) should be translated to the `--gpus` flag (plural form) to match SLURM's command line interface expectations.

Learnt from: cmeesters
Repo: snakemake/snakemake-executor-plugin-slurm PR: 173
File: tests/tests.py:79-86
Timestamp: 2025-03-10T13:22:54.081Z
Learning: In the Snakemake executor plugin for SLURM, when using GPU resources, the function returns `--gpu=n` (singular) when only a number is provided, but `--gpus=model:n` (plural) when a model is specified in the implementation.

Learnt from: cmeesters
Repo: snakemake/snakemake-executor-plugin-slurm PR: 173
File: snakemake_executor_plugin_slurm/__init__.py:0-0
Timestamp: 2025-02-18T14:40:27.064Z
Learning: In the Snakemake executor plugin for SLURM, the GPU resource is specified using the "gpus" (plural) resource name, not "gpu" (singular).

Learnt from: cmeesters
Repo: snakemake/snakemake-executor-plugin-slurm PR: 173
File: snakemake_executor_plugin_slurm/utils.py:75-100
Timestamp: 2025-02-18T14:49:42.624Z
Learning: In the Snakemake SLURM executor plugin, users must specify either `gres` or `gpus` for GPU resources, but not both simultaneously, as these are mutually exclusive ways to request GPU resources.

Learnt from: cmeesters
Repo: snakemake/snakemake-executor-plugin-slurm PR: 410
File: snakemake_executor_plugin_slurm/cli.py:82-93
Timestamp: 2026-02-19T08:23:44.191Z
Learning: In the snakemake-executor-plugin-slurm CLI tool (generate-slurm-partition-config), the `cluster` field within each partition's limits dictionary is intentional metadata that should be preserved in the YAML output. This field indicates which cluster the partition belongs to, which is essential for multi-cluster setups. The key-stripping logic (removing cluster prefixes from partition names) is separate from this metadata preservation.

Learnt from: cmeesters
Repo: snakemake/snakemake-executor-plugin-slurm PR: 0
File: :0-0
Timestamp: 2025-01-13T09:54:22.950Z
Learning: PR `#173` (adding gres resource specification) depends on PR `#28` in snakemake-executor-plugin-slurm-jobstep repository, as changes were required in the cpu-settings function of the jobstep-Executor module.

Learnt from: cmeesters
Repo: snakemake/snakemake-executor-plugin-slurm PR: 174
File: snakemake_executor_plugin_slurm/__init__.py:649-679
Timestamp: 2026-03-10T16:17:56.685Z
Learning: In snakemake-executor-plugin-slurm PR `#174` (feat/jobarrays), the `run_jobs` method groups jobs for array submission by `job.rule.name` only. It is a known, intentionally deferred limitation that jobs with different sbatch-relevant settings (runtime, memory, account, partition, cluster, slurm_extra) within the same rule can be incorrectly array-submitted together, since `run_array_jobs` builds the sbatch call from `jobs[0]`. A future PR is expected to partition `same_rule_jobs` by a full submission signature before calling `run_array_jobs`.

Learnt from: cmeesters
Repo: snakemake/snakemake-executor-plugin-slurm PR: 448
File: tests/test_account.py:0-0
Timestamp: 2026-04-15T13:09:14.164Z
Learning: In snakemake-executor-plugin-slurm, support for multiple SLURM accounts (comma/space-delimited) in `slurm_account` is anticipated but not yet implemented. The existing test `test_multiple_accounts_integer` intentionally uses a single integer account value (e.g., `slurm_account=123`) to reflect the current supported behavior, not a bug or oversight.

Copy link
Copy Markdown
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@cmeesters I'm not sure if this comment is true. Is gpus a correct standard resource in snakemake? The documentation says no, but the previous learnings of coderabbit say the contrary.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Seems like the humans are having a chat. I'll hop back into my burrow for now. If you need me again, just tag @coderabbitai in a new comment, and I'll come hopping out!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

skip gpu partitions when cpu job

1 participant