Conversation
fix: snakemake#446 When jobs are submitted with the automatic selection from the config file, the partition with the greatest score for that job is used. This can be a waste of compute resources when gpu partitions are selected for cpu-only jobs. Here, gpu partitions are given a penalty for cpu-only jobs.
WalkthroughModified GPU partition scoring logic to penalize GPU-equipped partitions when jobs request zero GPUs, and updated partition selection to consider all non-null scores including zero and negative values. This enables better allocation of CPU-only jobs to CPU-optimized partitions. Changes
Estimated code review effort🎯 3 (Moderate) | ⏱️ ~20 minutes Poem
🚥 Pre-merge checks | ✅ 4 | ❌ 1❌ Failed checks (1 warning)
✅ Passed checks (4 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches🧪 Generate unit tests (beta)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
There was a problem hiding this comment.
Actionable comments posted: 2
🧹 Nitpick comments (1)
tests/test_partition_selection.py (1)
29-29: Add a focused regression test for CPU jobs avoiding GPU partitions.Changing the shared fixture makes the existing CPU selection test pass, but it does not directly cover the issue scenario: a GPU partition that would otherwise score better for a CPU-only job. A separate test would make this behavior explicit and prevent regressions.
🧪 Suggested test shape
+ def test_cpu_only_job_prefers_cpu_partition_over_better_scored_gpu_partition( + self, temp_yaml_file, mock_job, mock_logger + ): + config = { + "partitions": { + "cpu": { + "max_runtime": 1440, + "max_mem_mb": 512000, + "max_threads": 128, + }, + "gpu": { + "max_runtime": 720, + "max_mem_mb": 64000, + "max_threads": 16, + "max_gpu": 4, + }, + } + } + temp_path = temp_yaml_file(config) + + try: + partitions = read_partition_file(temp_path) + job = mock_job(threads=8, mem_mb=32000, runtime=60) + + selected_partition = get_best_partition(partitions, job, mock_logger) + + assert selected_partition == "cpu" + finally: + temp_path.unlink()🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@tests/test_partition_selection.py` at line 29, Add a focused regression test named something like test_cpu_job_avoids_gpu_partition_regression in tests/test_partition_selection.py that constructs two partitions (one GPU-enabled partition that would otherwise score higher and one CPU-only partition) and a CPU-only job request (gpus=0 or equivalent), calls the partition selection function (select_partition / choose_partition or the module function under test), and asserts the selected partition is the CPU-only partition; use the same partition/job dataclass/fixture types as existing tests (e.g., Partition, JobSpec) and explicitly set the GPU count on the job to zero so the test fails if GPU partitions are chosen for CPU-only jobs.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@snakemake_executor_plugin_slurm/partitions.py`:
- Around line 267-268: The current logic appends penalized GPU partitions
(scored_partitions.append((p, score))) allowing negative scores to still make
GPU partitions eligible for CPU-only jobs; change this by detecting CPU-only
requests and filtering out GPU partitions entirely (or setting their score to
None) instead of keeping negative scores—modify the block around
scored_partitions.append((p, score)) (and the analogous handling at the other
site around lines 548-550) so that GPU partitions (identify via the partition
object used as p or its is_gpu flag/property) are excluded when the job requests
no GPUs, ensuring callers never receive GPU partitions for CPU-only jobs.
- Around line 548-550: The CPU-only penalty incorrectly uses gpu_count derived
without considering the 'gpus' resource, so parse 'gpus' as an alias for 'gpu'
wherever job GPU requirements are collected and used (e.g., the
function/variable that computes gpu_count for partition scoring and the
get_job_cpu_requirement helper); update the GPU-requirement extraction to check
both 'gpu' and 'gpus' (and existing 'gres'/'gpu_type' handling) and use that
unified GPU count when applying the "Disadvantage gpu partitions for cpu only
jobs" score adjustment so jobs that request resources: gpus=1 are not treated as
CPU-only.
---
Nitpick comments:
In `@tests/test_partition_selection.py`:
- Line 29: Add a focused regression test named something like
test_cpu_job_avoids_gpu_partition_regression in
tests/test_partition_selection.py that constructs two partitions (one
GPU-enabled partition that would otherwise score higher and one CPU-only
partition) and a CPU-only job request (gpus=0 or equivalent), calls the
partition selection function (select_partition / choose_partition or the module
function under test), and asserts the selected partition is the CPU-only
partition; use the same partition/job dataclass/fixture types as existing tests
(e.g., Partition, JobSpec) and explicitly set the GPU count on the job to zero
so the test fails if GPU partitions are chosen for CPU-only jobs.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Path: .coderabbit.yaml
Review profile: CHILL
Plan: Pro
Run ID: eb3839a5-4ef6-4d90-9a46-385fa03d816b
📒 Files selected for processing (2)
snakemake_executor_plugin_slurm/partitions.pytests/test_partition_selection.py
| if gpu_count == 0 and self.limits.max_gpu > 0: | ||
| # Disadvantage gpu partitions for cpu only jobs | ||
| score -= 1 |
There was a problem hiding this comment.
Account for the supported gpus resource before applying the CPU-only penalty.
A job using resources: gpus=1 is currently parsed as gpu_count == 0, so it gets treated as CPU-only and GPU eligibility is not enforced. Include gpus wherever GPU requirements are parsed/scored. Based on learnings, PR #173 added GPU job support via gres, gpu/gpus, and GPU type resources.
🐛 Possible fix: parse `gpus` as an alias for `gpu`
def parse_gpu_requirements(job: JobExecutorInterface) -> tuple[int, Optional[str]]:
"""Parse GPU requirements from job resources. Returns (count, model)"""
- gpu_required = job.resources.get("gpu", 0)
+ gpu_required = job.resources.get("gpu", job.resources.get("gpus", 0))
gres = job.resources.get("gres", "")Also mirror the same alias in get_job_cpu_requirement:
- gpu_required = job.resources.get("gpu", 0)
+ gpu_required = job.resources.get("gpu", job.resources.get("gpus", 0))🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@snakemake_executor_plugin_slurm/partitions.py` around lines 548 - 550, The
CPU-only penalty incorrectly uses gpu_count derived without considering the
'gpus' resource, so parse 'gpus' as an alias for 'gpu' wherever job GPU
requirements are collected and used (e.g., the function/variable that computes
gpu_count for partition scoring and the get_job_cpu_requirement helper); update
the GPU-requirement extraction to check both 'gpu' and 'gpus' (and existing
'gres'/'gpu_type' handling) and use that unified GPU count when applying the
"Disadvantage gpu partitions for cpu only jobs" score adjustment so jobs that
request resources: gpus=1 are not treated as CPU-only.
There was a problem hiding this comment.
gpus resource is not a standard resource in snakemake, so it is not supported.
There was a problem hiding this comment.
@GaspardR, thanks for the clarification! You're right — since gpus is not a standard Snakemake resource name, the concern doesn't apply. I've noted this for future reviews.
I'd also like to flag that a previously stored learning (ID 937711e3-f289-4580-970e-35752bf03c69) states: "the GPU resource is specified using the 'gpus' (plural) resource name". Based on your clarification, this seems inaccurate or at least misleading — it likely refers to the SLURM CLI --gpus flag rather than a Snakemake resource name. Would you like me to remove that learning?
(̷ ˃̵͈ᴗ˂̵͈)੭ु⁾⁾
✏️ Learnings added
Learnt from: GaspardR
Repo: snakemake/snakemake-executor-plugin-slurm PR: 454
File: snakemake_executor_plugin_slurm/partitions.py:548-550
Timestamp: 2026-04-22T01:31:02.790Z
Learning: In snakemake-executor-plugin-slurm, `gpus` is NOT a standard Snakemake resource name. The supported Snakemake resource keys for GPU jobs are `gpu` (singular, numeric count), `gres` (generic SLURM resource string, e.g. 'gpu:1'), `gpu_model`, and `gpu_manufacturer`. The SLURM CLI flag `--gpus` (plural) is what SLURM itself uses, but that is distinct from the Snakemake resource name.
Note: Learnings are effective only in the context of similar code segments. To apply general coding guidelines, please configure review instructions. You can manage existing learnings in the app.
🧠 Learnings used
Learnt from: GaspardR
Repo: snakemake/snakemake-executor-plugin-slurm PR: 454
File: snakemake_executor_plugin_slurm/partitions.py:267-268
Timestamp: 2026-04-22T01:25:17.194Z
Learning: In snakemake_executor_plugin_slurm/partitions.py, do not hard-exclude GPU partitions when GPU count is unavailable. Instead, preserve the intended fallback behavior by applying a score penalty (e.g., via Partition.score_job_fit using a `-1` penalty when `gpu_count == 0` and `self.limits.max_gpu > 0`) so CPU-only partitions are preferred, but GPU partitions may still be selected when no non-GPU partition is available. Ensure get_best_partition considers all non-None scores (including negative values) when choosing the best partition.
Learnt from: johanneskoester
Repo: snakemake/snakemake-executor-plugin-slurm PR: 173
File: docs/further.md:96-96
Timestamp: 2025-03-10T15:20:51.829Z
Learning: PR `#173` in snakemake-executor-plugin-slurm implements GPU job support by adding resources: `gres` for generic resource specifications (e.g., 'gpu:1'), `gpu`/`gpus` for specifying GPU counts, and `gpu_model`/`gpu_manufacturer` for specifying GPU types, allowing users to request GPU resources directly rather than having to use slurm_extra.
Learnt from: cmeesters
Repo: snakemake/snakemake-executor-plugin-slurm PR: 173
File: tests/tests.py:79-86
Timestamp: 2025-03-10T13:34:59.270Z
Learning: In the Snakemake executor plugin for SLURM, all GPU resources (even simple numeric ones) should be translated to the `--gpus` flag (plural) to match SLURM's expectations, not `--gpu` (singular).
Learnt from: cmeesters
Repo: snakemake/snakemake-executor-plugin-slurm PR: 173
File: tests/tests.py:79-86
Timestamp: 2025-03-10T13:34:59.270Z
Learning: In the Snakemake executor plugin for SLURM, all GPU resources (both simple numeric values and model:number specifications) should be translated to the `--gpus` flag (plural form) to match SLURM's command line interface expectations.
Learnt from: cmeesters
Repo: snakemake/snakemake-executor-plugin-slurm PR: 173
File: tests/tests.py:79-86
Timestamp: 2025-03-10T13:22:54.081Z
Learning: In the Snakemake executor plugin for SLURM, when using GPU resources, the function returns `--gpu=n` (singular) when only a number is provided, but `--gpus=model:n` (plural) when a model is specified in the implementation.
Learnt from: cmeesters
Repo: snakemake/snakemake-executor-plugin-slurm PR: 173
File: snakemake_executor_plugin_slurm/__init__.py:0-0
Timestamp: 2025-02-18T14:40:27.064Z
Learning: In the Snakemake executor plugin for SLURM, the GPU resource is specified using the "gpus" (plural) resource name, not "gpu" (singular).
Learnt from: cmeesters
Repo: snakemake/snakemake-executor-plugin-slurm PR: 173
File: snakemake_executor_plugin_slurm/utils.py:75-100
Timestamp: 2025-02-18T14:49:42.624Z
Learning: In the Snakemake SLURM executor plugin, users must specify either `gres` or `gpus` for GPU resources, but not both simultaneously, as these are mutually exclusive ways to request GPU resources.
Learnt from: cmeesters
Repo: snakemake/snakemake-executor-plugin-slurm PR: 410
File: snakemake_executor_plugin_slurm/cli.py:82-93
Timestamp: 2026-02-19T08:23:44.191Z
Learning: In the snakemake-executor-plugin-slurm CLI tool (generate-slurm-partition-config), the `cluster` field within each partition's limits dictionary is intentional metadata that should be preserved in the YAML output. This field indicates which cluster the partition belongs to, which is essential for multi-cluster setups. The key-stripping logic (removing cluster prefixes from partition names) is separate from this metadata preservation.
Learnt from: cmeesters
Repo: snakemake/snakemake-executor-plugin-slurm PR: 0
File: :0-0
Timestamp: 2025-01-13T09:54:22.950Z
Learning: PR `#173` (adding gres resource specification) depends on PR `#28` in snakemake-executor-plugin-slurm-jobstep repository, as changes were required in the cpu-settings function of the jobstep-Executor module.
Learnt from: cmeesters
Repo: snakemake/snakemake-executor-plugin-slurm PR: 174
File: snakemake_executor_plugin_slurm/__init__.py:649-679
Timestamp: 2026-03-10T16:17:56.685Z
Learning: In snakemake-executor-plugin-slurm PR `#174` (feat/jobarrays), the `run_jobs` method groups jobs for array submission by `job.rule.name` only. It is a known, intentionally deferred limitation that jobs with different sbatch-relevant settings (runtime, memory, account, partition, cluster, slurm_extra) within the same rule can be incorrectly array-submitted together, since `run_array_jobs` builds the sbatch call from `jobs[0]`. A future PR is expected to partition `same_rule_jobs` by a full submission signature before calling `run_array_jobs`.
Learnt from: cmeesters
Repo: snakemake/snakemake-executor-plugin-slurm PR: 448
File: tests/test_account.py:0-0
Timestamp: 2026-04-15T13:09:14.164Z
Learning: In snakemake-executor-plugin-slurm, support for multiple SLURM accounts (comma/space-delimited) in `slurm_account` is anticipated but not yet implemented. The existing test `test_multiple_accounts_integer` intentionally uses a single integer account value (e.g., `slurm_account=123`) to reflect the current supported behavior, not a bug or oversight.
There was a problem hiding this comment.
@cmeesters I'm not sure if this comment is true. Is gpus a correct standard resource in snakemake? The documentation says no, but the previous learnings of coderabbit say the contrary.
There was a problem hiding this comment.
Seems like the humans are having a chat. I'll hop back into my burrow for now. If you need me again, just tag @coderabbitai in a new comment, and I'll come hopping out!
fix: #446
When jobs are submitted with the automatic selection from the config file, the partition with the greatest score for that job is used. This can be a waste of compute resources when gpu partitions are selected for cpu-only jobs.
Here, gpu partitions are given a penalty for cpu-only jobs.
I slightly changed the
basic_partition_configto take in account the penalty in existent tests. Let me know if it's preferable to create a completely new test.Summary by CodeRabbit
Bug Fixes
Tests