-
Notifications
You must be signed in to change notification settings - Fork 38
feat!: improved GPU job support #173
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
WalkthroughThe changes simplify the SLURM job submission process within the Snakemake executor plugin by refining the handling of GPU resources and Generic Resource (GRES) specifications. The Changes
Possibly related PRs
Suggested reviewers
Poem
📜 Recent review detailsConfiguration used: .coderabbit.yaml 📒 Files selected for processing (53)
✅ Files skipped from review due to trivial changes (3)
🚧 Files skipped from review as they are similar to previous changes (36)
🧰 Additional context used🧠 Learnings (4)📓 Common learnings
snakemake_executor_plugin_slurm/utils.py (3)
docs/further.md (2)
snakemake_executor_plugin_slurm/__init__.py (2)
🪛 Ruff (0.8.2)snakemake_executor_plugin_slurm/__init__.py655-658: Within an (B904) 655-658: Within an (B904) 🪛 LanguageTooldocs/further.md[grammar] ~93-~93: Did you mean “specifying”? Or maybe you should add a pronoun? In active voice, ‘allow’ + ‘to’ takes an object, usually a pronoun. (ALLOW_TO) [uncategorized] ~93-~93: Possible missing comma found. (AI_HYDRA_LEO_MISSING_COMMA) [uncategorized] ~96-~96: Possible missing preposition found. (AI_HYDRA_LEO_MISSING_IN) [typographical] ~97-~97: Two consecutive dots (DOUBLE_PUNCTUATION) ⏰ Context from checks skipped due to timeout of 90000ms (1)
🔇 Additional comments (2)
✨ Finishing Touches
🪧 TipsChatThere are 3 ways to chat with CodeRabbit:
Note: Be mindful of the bot's finite context window. It's strongly recommended to break down tasks such as reading entire modules into smaller chunks. For a focused discussion, use review comments to chat about specific files and their changes, instead of using the PR comments. CodeRabbit Commands (Invoked using PR comments)
Other keywords and placeholders
Documentation and Community
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 1
🧹 Outside diff range and nitpick comments (3)
docs/further.md (2)
121-121
: Enhance the GRES resource documentation with more details and examplesWhile the current description is accurate, it would be helpful to expand it with:
- Common GRES types (e.g., gpu, mps, fpga)
- Multiple resource syntax (e.g.,
gpu:tesla:2
)- Resource property specifications
- Links to SLURM's GRES documentation
Consider updating the table entry to something like:
-| `--gres` | `gres` | generic resource specification, e.g. `gres=gpu:1`, allows for arbitray strings after the `=` | +| `--gres` | `gres` | Generic Resource Scheduling specification. Examples: `gres=gpu:1` (1 GPU), `gres=gpu:tesla:2` (2 Tesla GPUs), `gres=mps:100` (100% MPS). See [SLURM GRES Guide](https://slurm.schedmd.com/gres.html) for details. |
Line range hint
123-130
: Add a dedicated GRES example for AI/ML workloadsGiven the PR's focus on GRES support for AI applications, it would be valuable to add a specific example showing how to request GPU resources.
Consider adding this example after the current generic example:
rule: input: ... output: ... resources: partition=<partition name> runtime=<some number> + +# Example: GPU-accelerated deep learning training +rule train_model: + input: "data/training_set.h5" + output: "models/trained_model.pt" + resources: + # Request 2 NVIDIA A100 GPUs + gres="gpu:a100:2", + # Ensure sufficient CPU cores and memory for data preprocessing + cpus_per_task=8, + mem_mb=32000snakemake_executor_plugin_slurm/__init__.py (1)
Line range hint
313-315
: Consider making status_attempts configurableThe number of status check attempts is hardcoded. Consider making it configurable through the ExecutorSettings class to allow users to adjust based on their cluster's characteristics.
+ status_check_attempts: int = field( + default=5, + metadata={ + "help": "Number of attempts to check job status before giving up", + "env_var": False, + "required": False, + }, + ) - status_attempts = 5 + status_attempts = self.workflow.executor_settings.status_check_attempts
📜 Review details
Configuration used: .coderabbit.yaml
Review profile: CHILL
📒 Files selected for processing (2)
docs/further.md
(1 hunks)snakemake_executor_plugin_slurm/__init__.py
(1 hunks)
🔇 Additional comments (2)
snakemake_executor_plugin_slurm/__init__.py (2)
Line range hint 238-251
: LGTM: Robust error handling implementation
The error handling for job submission is comprehensive, covering both return code errors and submission failure messages, with clear error reporting.
Line range hint 174-198
: LGTM: Comprehensive resource management
The implementation handles various SLURM resources effectively with appropriate warnings and fallback mechanisms. The code properly manages:
- Runtime constraints
- Memory allocation (both per-job and per-CPU)
- Constraint specifications
@CodeRabbit no, this PR only adds |
Tip For best results, initiate chat on the files or code changes. @cmeesters: Understood, thank you for the clarification. I acknowledge that this PR only adds ✏️ Learnings added
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 0
🧹 Outside diff range and nitpick comments (1)
snakemake_executor_plugin_slurm/__init__.py (1)
161-170
: LGTM! Well-implemented GRES support.The implementation correctly validates the GRES format and provides clear error messages. The regex pattern accurately matches SLURM's GRES syntax requirements.
Consider adding code comments with common GRES examples to help users:
if job.resources.get("gres"): + # Common GRES examples: + # gpu:1 - Request 1 GPU of any type + # gpu:tesla:2 - Request 2 Tesla GPUs + # mps:100 - Request 100% of MPS (CUDA Multi-Process Service) # Validate GRES format (e.g., "gpu:1", "gpu:tesla:2") gres = job.resources.gres
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sounds reasonable. In addition, I would however suggest to further support two resources gpu: int
and gpu_model: str
, which can be internally translated into a gres parametrization (while an explicit gres specification takes always precedence). This way, we are consistent with an emerging common practice, which is currently also being implemented for kubernetes here: snakemake/snakemake-executor-plugin-kubernetes#31
@johanneskoester SLURM's generic approach is to write I'm fine with providing two more resources for the sake of consistency. However, we need to keep an eye on ease, too. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 0
🧹 Nitpick comments (2)
snakemake_executor_plugin_slurm/__init__.py (2)
161-169
: Enhance GRES format validation and resource specification flexibilityThe current regex enforces
[a-zA-Z0-9]+:([a-zA-Z0-9]+:)?\d+
, disallowing underscores or hyphens in resource names. SLURM setups sometimes include underscores (e.g.,gpu:tesla_v100:2
). Consider extending the pattern to capture these valid resource naming conventions. Also, use caution if resource values could come from untrusted sources; adding shell quoting enhances safety.- if not re.match(r"^[a-zA-Z0-9]+:([a-zA-Z0-9]+:)?\d+$", gres): + if not re.match(r"^[a-zA-Z0-9_]+(:[a-zA-Z0-9_]+){0,1}:\d+$", gres): raise WorkflowError( f"Invalid GRES format: {gres}. Expected format: " "'<name>:<number>' or '<name>:<type>:<number>'" )
170-183
: Consider supporting GPU without specifying a modelRight now, if
gpu
is set butgpu_model
is missing, this block does nothing. You might allow a simplergpu:<number>
usage for users who don’t need a specific GPU model. If that’s not intended, it might help to clarify via error or docstring that both fields are required.
📜 Review details
Configuration used: .coderabbit.yaml
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (1)
snakemake_executor_plugin_slurm/__init__.py
(1 hunks)
🧰 Additional context used
📓 Learnings (1)
📓 Common learnings
Learnt from: cmeesters
PR: snakemake/snakemake-executor-plugin-slurm#0
File: :0-0
Timestamp: 2024-11-25T21:10:22.280Z
Learning: PR #173 adds `gres` as a possible resource specification in the Snakemake executor plugin for SLURM; other features are addressed in separate PRs.
@coderabbitai jolly good remark in your last comment. I hope the last commit is an improvement. Now, the code base is growing too big, IMO, and needs to be outsourced to a different module. |
Tip For best results, initiate chat on the files or code changes. @cmeesters: Thank you for the update! Your latest commit does appear to address the prior remark, and it’s great to see improvement. Regarding the code base growing too large, would you like to open a separate PR or GitHub issue to track refactoring to a dedicated module? I’m happy to assist as needed. Let me know how you’d like to proceed. 🧠 Learnings used
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 0
🧹 Nitpick comments (3)
snakemake_executor_plugin_slurm/__init__.py (3)
86-87
: Consider using a more strict regex pattern for GRES validation.The current GRES regex pattern allows consecutive underscores and potentially invalid combinations. Consider using a more restrictive pattern:
- gres_re = re.compile(r"^[a-zA-Z0-9_]+:([a-zA-Z0-9_]+:)?\d+$") + gres_re = re.compile(r"^[a-zA-Z0-9]+(?:_[a-zA-Z0-9]+)*:(?:[a-zA-Z0-9]+(?:_[a-zA-Z0-9]+)*:)?\d+$")
173-184
: Consider improving error handling and variable scope.The implementation is correct but could be improved:
- Initialize
gres_string
at the beginning of the method to avoid potential undefined variable issues- Add more context to the GPU integer validation error message
+ gres_string = "" # Initialize at the beginning if job.resources.get("gpu"): if job.resources.get("gres"): raise WorkflowError( "GRES and GPU are set. Please only set one of them." ) if not isinstance(job.resources.gpu, int): raise WorkflowError( - "The 'gpu' resource must be an integer. " - f"Got: {job.resources.gpu} ({type(job.resources.gpu)})." + f"The 'gpu' resource must be an integer specifying the number of GPUs. " + f"Got: {job.resources.gpu} of type {type(job.resources.gpu).__name__}." )
186-198
: Enhance GPU model validation error message.The GPU model validation is correct, but the error message could be more helpful.
if not Executor.gpu_model_re.match(job.resources.gpu_model): raise WorkflowError( - f"Invalid GPU model format: {job.resources.gpu_model}. " - "Expected format: '<name>'" + f"Invalid GPU model format: {job.resources.gpu_model}. " + "GPU model must contain only alphanumeric characters and underscores. " + "Example: 'tesla' or 'a100'" )
📜 Review details
Configuration used: .coderabbit.yaml
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (1)
snakemake_executor_plugin_slurm/__init__.py
(2 hunks)
🧰 Additional context used
📓 Learnings (1)
📓 Common learnings
Learnt from: cmeesters
PR: snakemake/snakemake-executor-plugin-slurm#0
File: :0-0
Timestamp: 2024-11-25T21:10:22.280Z
Learning: PR #173 adds `gres` as a possible resource specification in the Snakemake executor plugin for SLURM; other features are addressed in separate PRs.
🔇 Additional comments (2)
snakemake_executor_plugin_slurm/__init__.py (2)
164-172
: LGTM! Well-implemented GRES validation.The GRES validation is thorough and provides clear error messages.
199-201
: LGTM! Clean conditional assignment of GRES string.The final GRES string assignment is well-structured and handles all cases correctly.
@johanneskoester the code ought to be ok-ish. As mentioned, I rather refactor in the future to avoid an extended single file. That aside, please review. I would prefer to merge the main into this PR after PR #159 is done, because of the various changes in the docs and the help strings, which will only lead to merge conflicts, if done twice. (And I have plenty of other things on my mind, too.) |
…-gpu - needs adjustment in jobstep executor code, too.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 0
🧹 Nitpick comments (3)
snakemake_executor_plugin_slurm/__init__.py (3)
117-118
: LGTM! Consider adding pattern documentation.The regex patterns are well-defined and efficiently compiled at class level.
Consider adding docstrings to explain the expected formats:
class Executor(RemoteExecutor): + # Validates GRES format (e.g., "gpu:1", "gpu:tesla:2") gres_re = re.compile(r"^[a-zA-Z0-9_]+:([a-zA-Z0-9_]+:)?\d+$") + # Validates GPU model names (e.g., "tesla", "a100") gpu_model_re = re.compile(r"^[a-zA-Z0-9_]+$")
234-262
: Fix variable naming inconsistency.The code uses both
gpus
andgpu
in different contexts. On line 294, it checks forgpus
but the resource is defined asgpu
.Apply this change for consistency:
- gpu_job = job.resources.get("gpus") or "gpu" in job.resources.get("gres", "") + gpu_job = job.resources.get("gpu") or "gpu" in job.resources.get("gres", "")
309-311
: Enhance comment clarity for CPU allocation.The current comment could be more descriptive about the CPU allocation strategy.
- # we need to set cpus-per-task OR cpus-per-gpu, the function - # will return a string with the corresponding value + # Determine CPU allocation strategy: + # - For GPU jobs: sets --cpus-per-gpu based on job resources + # - For non-GPU jobs: sets --cpus-per-task based on job resources + # The get_cpus_per_task function returns the appropriate SLURM parameter call += f" {get_cpus_per_task(job, gpu_job)}"
📜 Review details
Configuration used: .coderabbit.yaml
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (1)
snakemake_executor_plugin_slurm/__init__.py
(4 hunks)
🧰 Additional context used
📓 Learnings (1)
📓 Common learnings
Learnt from: cmeesters
PR: snakemake/snakemake-executor-plugin-slurm#0
File: :0-0
Timestamp: 2024-11-25T21:10:22.280Z
Learning: PR #173 adds `gres` as a possible resource specification in the Snakemake executor plugin for SLURM; other features are addressed in separate PRs.
⏰ Context from checks skipped due to timeout of 90000ms (2)
- GitHub Check: formatting
- GitHub Check: testing
🔇 Additional comments (1)
snakemake_executor_plugin_slurm/__init__.py (1)
225-233
: LGTM! Robust GRES validation implementation.The GRES validation logic is thorough with clear error messages and follows SLURM's specifications.
@johanneskoester and @coderabbitai - this PR is now depended on snakemake/snakemake-executor-plugin-slurm-jobstep#28 . A change in the cpu-settings function of the jobstep-Executor module was required, too. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 3
🧹 Nitpick comments (2)
docs/further.md (2)
93-93
: Fix grammar in SLURM GPU request description.The sentence structure "SLURM allows to specify" is grammatically incorrect.
-SLURM allows to specify GPU request with the `--gres` or `--gpus` flags, Snakemake takes a similar approach. Resources can be asked for with +SLURM allows specifying GPU requests with the `--gres` or `--gpus` flags, and Snakemake takes a similar approach. Resources can be asked for with🧰 Tools
🪛 LanguageTool
[grammar] ~93-~93: Did you mean “specifying”? Or maybe you should add a pronoun? In active voice, ‘allow’ + ‘to’ takes an object, usually a pronoun.
Context: ...tribution. ### GPU Jobs SLURM allows to specify GPU request with the
--gresor
--gpu...(ALLOW_TO)
95-95
: Fix bullet point formatting.There's a loose punctuation mark in the bullet point formatting.
-- `gres`, e.g. `gres=gpu:1` or `gres=gpu:tesla:2`, hence `<string>:<number>` or `<string>:<model>:<number>` +- `gres`: e.g. `gres=gpu:1` or `gres=gpu:tesla:2`, hence `<string>:<number>` or `<string>:<model>:<number>`🧰 Tools
🪛 LanguageTool
[uncategorized] ~95-~95: Loose punctuation mark.
Context: ...esources can be asked for with -gres
, e.g.gres=gpu:1
orgres=gpu:tesla:2
...(UNLIKELY_OPENING_PUNCTUATION)
📜 Review details
Configuration used: .coderabbit.yaml
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (2)
docs/further.md
(1 hunks)tests/tests.py
(2 hunks)
🧰 Additional context used
🧠 Learnings (1)
📓 Common learnings
Learnt from: cmeesters
PR: snakemake/snakemake-executor-plugin-slurm#0
File: :0-0
Timestamp: 2024-11-25T21:10:22.280Z
Learning: PR #173 adds `gres` as a possible resource specification in the Snakemake executor plugin for SLURM; other features are addressed in separate PRs.
Learnt from: cmeesters
PR: snakemake/snakemake-executor-plugin-slurm#0
File: :0-0
Timestamp: 2025-01-13T09:54:22.950Z
Learning: PR #173 (adding gres resource specification) depends on PR #28 in snakemake-executor-plugin-slurm-jobstep repository, as changes were required in the cpu-settings function of the jobstep-Executor module.
Learnt from: cmeesters
PR: snakemake/snakemake-executor-plugin-slurm#173
File: snakemake_executor_plugin_slurm/utils.py:75-100
Timestamp: 2025-02-18T14:49:42.624Z
Learning: In the Snakemake SLURM executor plugin, users must specify either `gres` or `gpus` for GPU resources, but not both simultaneously, as these are mutually exclusive ways to request GPU resources.
Learnt from: cmeesters
PR: snakemake/snakemake-executor-plugin-slurm#173
File: snakemake_executor_plugin_slurm/__init__.py:0-0
Timestamp: 2025-02-18T14:40:27.064Z
Learning: In the Snakemake executor plugin for SLURM, the GPU resource is specified using the "gpus" (plural) resource name, not "gpu" (singular).
🪛 LanguageTool
docs/further.md
[grammar] ~93-~93: Did you mean “specifying”? Or maybe you should add a pronoun? In active voice, ‘allow’ + ‘to’ takes an object, usually a pronoun.
Context: ...tribution. ### GPU Jobs SLURM allows to specify GPU request with the
--gresor
--gpu...
(ALLOW_TO)
[uncategorized] ~95-~95: Loose punctuation mark.
Context: ...esources can be asked for with - gres
, e.g. gres=gpu:1
or gres=gpu:tesla:2
...
(UNLIKELY_OPENING_PUNCTUATION)
⏰ Context from checks skipped due to timeout of 90000ms (2)
- GitHub Check: testing
- GitHub Check: linting
🔇 Additional comments (4)
docs/further.md (1)
98-98
:❓ Verification inconclusive
Clarify behavior when cpus_per_gpu is set to 0 or negative.
The documentation states that if
cpus_per_gpu
is lower or equal to zero, no CPU is requested from SLURM. This behavior should be cross-referenced with the implementation to ensure consistency.
🏁 Script executed:
#!/bin/bash # Verify the implementation of cpus_per_gpu behavior rg -A 10 "cpus_per_gpu" snakemake_executor_plugin_slurm/Length of output: 56
Action Required: Verify Implementation of
cpus_per_gpu
BehaviorOur initial search within the
snakemake_executor_plugin_slurm/
directory did not yield any references tocpus_per_gpu
. This makes it unclear whether the documented behavior—namely, that settingcpus_per_gpu
to 0 or a negative value results in no CPU request from SLURM—is actually implemented in the code. Please perform a manual review or provide additional verification to ensure that:
- The code correctly falls back to using the
threads
settings whencpus_per_gpu
is not explicitly set.- When
cpus_per_gpu
is 0 or negative, no CPU resources are requested from SLURM.Contact the team or check other parts of the codebase if this setting might be defined or applied elsewhere. Adjust the documentation accordingly based on your findings.
If further automated verification is desired, consider running:
#!/bin/bash # Search the entire repository for references to cpus_per_gpu rg -A 10 "cpus_per_gpu" .This may help locate any implementation details that could have been missed initially.
tests/tests.py (3)
30-49
: Well-structured mock fixture for testing.The mock_job fixture is well-designed, providing a flexible way to create mock jobs with configurable resources for testing. The implementation properly handles both dictionary access and attribute access for resources.
93-107
: Comprehensive test cases for GPU model and manufacturer specifications.The test cases thoroughly cover different scenarios for GPU model and manufacturer specifications, including error handling for invalid formats.
117-121
: Good validation of mutual exclusivity between GRES and GPU.This test appropriately validates that the function raises an error when both GRES and GPU resources are specified, which aligns with the retrieved learning that users must specify either
gres
orgpus
for GPU resources, but not both simultaneously.
Co-authored-by: Johannes Köster <[email protected]>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 2
🧹 Nitpick comments (2)
docs/further.md (2)
91-94
: Clarify and Enhance the GPU Jobs Section IntroductionThe introduction for GPU Jobs clearly explains that SLURM’s GPU flags are supported, but the sentence "SLURM allows to specify GPU request with the
--gres
or--gpus
flags, Snakemake takes a similar approach." could be improved for clarity. For example, consider rephrasing it to:"SLURM allows you to specify GPU requests using the
--gres
or--gpus
flags, and Snakemake adopts a similar approach."This small change improves readability and maintains an active voice.
🧰 Tools
🪛 LanguageTool
[grammar] ~93-~93: Did you mean “specifying”? Or maybe you should add a pronoun? In active voice, ‘allow’ + ‘to’ takes an object, usually a pronoun.
Context: ...tribution. ### GPU Jobs SLURM allows to specify GPU request with the
--gresor
--gpu...(ALLOW_TO)
97-98
: Correct Typographical Error in Resource ExplanationThere is a typographical error in the sentence "Additionally,
cpus_per_gpu
can be set - Snakemakesthreads
settings will otherwise be used to setcpus_per_gpu
." The term "Snakemakes" should be corrected to "Snakemake's". This will improve clarity and professionalism in the documentation.
📜 Review details
Configuration used: .coderabbit.yaml
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (1)
docs/further.md
(1 hunks)
🧰 Additional context used
🧠 Learnings (2)
📓 Common learnings
Learnt from: cmeesters
PR: snakemake/snakemake-executor-plugin-slurm#0
File: :0-0
Timestamp: 2025-01-13T09:54:22.950Z
Learning: PR #173 (adding gres resource specification) depends on PR #28 in snakemake-executor-plugin-slurm-jobstep repository, as changes were required in the cpu-settings function of the jobstep-Executor module.
Learnt from: johanneskoester
PR: snakemake/snakemake-executor-plugin-slurm#173
File: docs/further.md:96-96
Timestamp: 2025-03-10T15:20:51.829Z
Learning: PR #173 in snakemake-executor-plugin-slurm implements GPU job support by adding resources: `gres` for generic resource specifications, `gpu`/`gpus` for specifying GPU counts, and `gpu_model`/`gpu_manufacturer` for specifying GPU types, allowing users to request GPU resources in a more standardized way.
Learnt from: johanneskoester
PR: snakemake/snakemake-executor-plugin-slurm#173
File: docs/further.md:96-96
Timestamp: 2025-03-10T15:20:51.829Z
Learning: PR #173 in snakemake-executor-plugin-slurm implements GPU job support by adding resources: `gres` for generic resource specifications (e.g., 'gpu:1'), `gpu`/`gpus` for specifying GPU counts, and `gpu_model`/`gpu_manufacturer` for specifying GPU types, allowing users to request GPU resources directly rather than having to use slurm_extra.
Learnt from: cmeesters
PR: snakemake/snakemake-executor-plugin-slurm#173
File: snakemake_executor_plugin_slurm/utils.py:75-100
Timestamp: 2025-02-18T14:49:42.624Z
Learning: In the Snakemake SLURM executor plugin, users must specify either `gres` or `gpus` for GPU resources, but not both simultaneously, as these are mutually exclusive ways to request GPU resources.
Learnt from: cmeesters
PR: snakemake/snakemake-executor-plugin-slurm#173
File: tests/tests.py:79-86
Timestamp: 2025-03-10T13:34:59.270Z
Learning: In the Snakemake executor plugin for SLURM, all GPU resources (both simple numeric values and model:number specifications) should be translated to the `--gpus` flag (plural form) to match SLURM's command line interface expectations.
Learnt from: cmeesters
PR: snakemake/snakemake-executor-plugin-slurm#173
File: snakemake_executor_plugin_slurm/__init__.py:0-0
Timestamp: 2025-02-18T14:40:27.064Z
Learning: In the Snakemake executor plugin for SLURM, the GPU resource is specified using the "gpus" (plural) resource name, not "gpu" (singular).
docs/further.md (3)
Learnt from: johanneskoester
PR: snakemake/snakemake-executor-plugin-slurm#173
File: docs/further.md:96-96
Timestamp: 2025-03-10T15:20:51.829Z
Learning: PR #173 in snakemake-executor-plugin-slurm implements GPU job support by adding resources: `gres` for generic resource specifications (e.g., 'gpu:1'), `gpu`/`gpus` for specifying GPU counts, and `gpu_model`/`gpu_manufacturer` for specifying GPU types, allowing users to request GPU resources directly rather than having to use slurm_extra.
Learnt from: johanneskoester
PR: snakemake/snakemake-executor-plugin-slurm#173
File: docs/further.md:96-96
Timestamp: 2025-03-10T15:20:51.829Z
Learning: PR #173 in snakemake-executor-plugin-slurm implements GPU job support by adding resources: `gres` for generic resource specifications, `gpu`/`gpus` for specifying GPU counts, and `gpu_model`/`gpu_manufacturer` for specifying GPU types, allowing users to request GPU resources in a more standardized way.
Learnt from: johanneskoester
PR: snakemake/snakemake-executor-plugin-slurm#173
File: docs/further.md:96-96
Timestamp: 2025-03-10T15:20:51.828Z
Learning: PR #173 in snakemake-executor-plugin-slurm implements GPU job support by adding resources: `gres` for generic resource specifications, `gpu`/`gpus` for specifying GPU counts, and `gpu_model`/`gpu_manufacturer` for specifying GPU types, allowing users to request GPU resources in a more standardized way.
🪛 LanguageTool
docs/further.md
[grammar] ~93-~93: Did you mean “specifying”? Or maybe you should add a pronoun? In active voice, ‘allow’ + ‘to’ takes an object, usually a pronoun.
Context: ...tribution. ### GPU Jobs SLURM allows to specify GPU request with the
--gresor
--gpu...
(ALLOW_TO)
[uncategorized] ~95-~95: Loose punctuation mark.
Context: ...esources can be asked for with - gres
, e.g. gres=gpu:1
or gres=gpu:tesla:2
...
(UNLIKELY_OPENING_PUNCTUATION)
⏰ Context from checks skipped due to timeout of 90000ms (1)
- GitHub Check: testing
```YAML | ||
set-resources: | ||
gres_request_rule: | ||
gres: "gpu:1" | ||
|
||
multi_gpu_rule: | ||
gpu: 2 | ||
gpu_model: "tesla" | ||
# OR | ||
gpu_manufacturer: "nvidia" | ||
cpus_per_gpu: 4 | ||
|
||
no_cpu_gpu_rule: | ||
gpu: 1 | ||
cpus_per_gpu: 0 # Values <= 0 indicate that NO CPU request string | ||
# will be issued. | ||
``` |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
🛠️ Refactor suggestion
Align YAML Example with Consistent Resource Naming
The YAML snippet effectively demonstrates GPU resource requests in a workflow profile. To maintain consistency with the updated guidance above, please change the GPU resource key from "gpu": 2
(line 108) to "gpus": 2
. This change will ensure that the documentation uses the plural form consistently across all examples, as endorsed by the implementation guidelines and retrieved learnings.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Caution
Inline review comments failed to post. This is likely due to GitHub's limits when posting large numbers of comments. If you are seeing this consistently it is likely a permissions issue. Please check "Moderation" -> "Code review limits" under your organization settings.
Actionable comments posted: 5
🧹 Nitpick comments (5)
snakemake_executor_plugin_slurm/utils.py (2)
105-108
: Update error message for consistency with implementation.The error message refers to 'gpu', but if you're standardizing on 'gpus' (plural), this should be updated for consistency.
- "GPU model is set, but no GPU number is given. " "Please set 'gpu' as well." + "GPU model is set, but no GPU number is given. Please set 'gpus' as well."
50-109
: Refactor the function for better maintainability.The current implementation has multiple early returns, duplicated checks, and an unreachable code block. Consider refactoring the entire function to improve maintainability:
def set_gres_string(job: JobExecutorInterface) -> str: """ Function to set the gres string for the SLURM job based on the resources requested in the job. """ # Define regex patterns for validation gres_re = re.compile(r"^[a-zA-Z0-9_]+(:[a-zA-Z0-9_]+)?:\d+$") gpu_model_re = re.compile(r"^[a-zA-Z0-9_]+$") gpus_re = re.compile(r"^\d+$|^[a-zA-Z0-9_]+:\d+$") # Initialize result gres_string = "" # Check for gres resource has_gres = job.resources.get("gres") has_gpu = job.resources.get("gpu") has_gpus = job.resources.get("gpus") has_gpu_model = job.resources.get("gpu_model") # Ensure mutually exclusive resources if has_gres and (has_gpu or has_gpus): raise WorkflowError("GRES and GPU/GPUS are set. Please only set one of them.") # Process gres resource if has_gres: gres = job.resources.gres if not gres_re.match(gres): raise WorkflowError( f"Invalid GRES format: {gres}. Expected format: " "'<name>:<number>' or '<name>:<type>:<number>' " "(e.g., 'gpu:1' or 'gpu:tesla:2')" ) gres_string = f" --gres={job.resources.gres}" # Process gpu with model elif has_gpu_model and (has_gpu or has_gpus): gpu_model = job.resources.gpu_model gpu_value = job.resources.gpu if has_gpu else job.resources.gpus if not gpu_model_re.match(gpu_model): raise WorkflowError( f"Invalid GPU model format: {gpu_model}. " "Expected format: '<name>' " "(e.g., 'tesla')" ) gres_string = f" --gpus={gpu_model}:{gpu_value}" # Process gpu without model elif has_gpu or has_gpus: gpu_value = job.resources.gpu if has_gpu else job.resources.gpus if not gpus_re.match(str(gpu_value)): raise WorkflowError( f"Invalid GPU format: {gpu_value}. " "Expected format: '<number>' or '<name>:<number>' " "(e.g., '1' or 'tesla:2')" ) gres_string = f" --gpus={gpu_value}" # Handle case where model is set but no gpu number elif has_gpu_model: raise WorkflowError( "GPU model is set, but no GPU number is given. Please set 'gpus' as well." ) return gres_stringdocs/further.md (3)
93-93
: Fix grammatical issue in documentation.The phrasing "SLURM allows to specify" is grammatically incorrect.
-SLURM allows to specify GPU request with the `--gres` or `--gpus` flags, Snakemake takes a similar approach. Resources can be asked for with +SLURM allows specifying GPU request with the `--gres` or `--gpus` flags. Snakemake takes a similar approach. Resources can be asked for with🧰 Tools
🪛 LanguageTool
[grammar] ~93-~93: Did you mean “specifying”? Or maybe you should add a pronoun? In active voice, ‘allow’ + ‘to’ takes an object, usually a pronoun.
Context: ...tribution. ### GPU Jobs SLURM allows to specify GPU request with the
--gresor
--gpu...(ALLOW_TO)
107-109
: Update the example to use consistent resource naming.The example should use the same resource naming convention established earlier in the documentation.
multi_gpu_rule: - gpus: 2 + gpus: 2 cpus_per_gpu: 4 no_cpu_gpu_rule: - gpus: 1 + gpus: 1 cpus_per_gpu: -1
98-98
: Clarify CPU allocation behavior withcpus_per_gpu
.The documentation states that if
cpus_per_gpu
is lower or equal to zero, no CPU is requested from SLURM. However, it's not clear how this interacts with thethreads
setting.Consider clarifying with an example:
-Additionally, `cpus_per_gpu` can be set - Snakemakes `threads` settings will otherwise be used to set `cpus_per_gpu`. If `cpus_per_gpu` is lower or equal to zero, no CPU is requested from SLURM (and cluster defaults will kick in, if any). +Additionally, `cpus_per_gpu` can be set - Snakemake's `threads` settings will otherwise be used to set `cpus_per_gpu`. If `cpus_per_gpu` is lower or equal to zero, no CPU is requested from SLURM (and cluster defaults will kick in, if any). For example: + +```python +rule gpu_job: + input: ... + output: ... + threads: 4 # Would normally set cpus_per_gpu=4 + resources: + gpus: 1, + cpus_per_gpu: 0 # Override to request no CPUs with GPU +```
🛑 Comments failed to post (5)
snakemake_executor_plugin_slurm/utils.py (4)
109-109:
⚠️ Potential issueFix inconsistent return condition.
The final return statement only checks for "gres" or "gpu" resources, but not "gpus" which is also handled in the function. This could lead to the function incorrectly returning an empty string.
- return gres_string if job.resources.get("gres") or job.resources.get("gpu") else "" + return gres_string if job.resources.get("gres") or job.resources.get("gpu") or job.resources.get("gpus") else ""📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.return gres_string if job.resources.get("gres") or job.resources.get("gpu") or job.resources.get("gpus") else ""
84-95: 🛠️ Refactor suggestion
Standardize on
gpus
(plural) resource name.Based on the retrieved learnings and SLURM's command line interface, the GPU resource should be specified using "gpus" (plural) not "gpu" (singular). Consider updating this to check for
job.resources.get("gpus")
instead.- if job.resources.get("gpu"): + if job.resources.get("gpus"): # ensure that gres is not set, if gpu and gpu_model are set if job.resources.get("gres"): - raise WorkflowError("GRES and GPU are set. Please only set one" " of them.") + raise WorkflowError("GRES and GPUS are set. Please only set one of them.") # validate GPU format - if not gpus_re.match(str(job.resources.gpu)): + if not gpus_re.match(str(job.resources.gpus)): raise WorkflowError( - f"Invalid GPU format: {job.resources.gpu}. " + f"Invalid GPU format: {job.resources.gpus}. " "Expected format: '<number>' or '<name>:<number>' " "(e.g., '1' or 'tesla:2')" ) - gres_string = f" --gpus={job.resources.gpu}" + gres_string = f" --gpus={job.resources.gpus}"📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.if job.resources.get("gpus"): # ensure that gres is not set, if gpus and gpu_model are set if job.resources.get("gres"): raise WorkflowError("GRES and GPUS are set. Please only set one of them.") # validate GPU format if not gpus_re.match(str(job.resources.gpus)): raise WorkflowError( f"Invalid GPU format: {job.resources.gpus}. " "Expected format: '<number>' or '<name>:<number>' " "(e.g., '1' or 'tesla:2')" ) gres_string = f" --gpus={job.resources.gpus}"
73-83:
⚠️ Potential issueDuplicate GPU resource check.
This block is duplicated with the check at line 84-95, which could lead to confusing behavior. The two blocks also have inconsistent validation: one checks if
gpu
is an integer while the other uses a regex pattern.- if job.resources.get("gpu"): - # ensure that gres is not set, if gpu and gpu_model are set - if job.resources.get("gres"): - raise WorkflowError("GRES and GPU are set. Please only set one of them.") - # ensure that 'gpu' is an integer - if not isinstance(job.resources.gpu, int): - raise WorkflowError( - "The 'gpu' resource must be an integer. " - f"Got: {job.resources.gpu} ({type(job.resources.gpu)})." - ) - gres_string = f" --gpus={job.resources.gpu}"📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.if job.resources.get("gres") and gpu_string: raise WorkflowError("GRES and GPU are set. Please only set one of them.") elif not job.resources.get("gres") and not gpu_model and not gpu_string: return "" if job.resources.get("gres"): # Validate GRES format (e.g., "gpu:1", "gpu:tesla:2") gres = job.resources.gres if not gres_re.match(gres): raise WorkflowError( f"Invalid GRES format: {gres}. Expected format: '<name>:<number>' or '<name>:<type>:<number>'" ) gres_string = f" --gres={gres}" return gres_string # Duplicate GPU resource check block removed. # The remaining GPU validation (using regex) in the subsequent block will handle GPU resource conversions.
96-105:
⚠️ Potential issueFix unreachable code block.
This
elif
block will never be executed because the condition at line 84 (if job.resources.get("gpu")
) will always trigger first when bothgpu
andgpu_model
are set, making this code unreachable.Reorder the conditions to check for both GPU model and GPU first:
- elif job.resources.get("gpu_model") and job.resources.get("gpu"): + if job.resources.get("gpu_model") and job.resources.get("gpu"): # validate GPU model format if not gpu_model_re.match(job.resources.gpu_model): raise WorkflowError( f"Invalid GPU model format: {job.resources.gpu_model}. " "Expected format: '<name>' " "(e.g., 'tesla')" ) - gres_string = f" --gpus:{job.resources.gpu_model}:{job.resources.gpu}" + gres_string = f" --gpus={job.resources.gpu_model}:{job.resources.gpu}" + elif job.resources.get("gpu"): + # validate GPU format + if not gpus_re.match(str(job.resources.gpu)): + raise WorkflowError( + f"Invalid GPU format: {job.resources.gpu}. " + "Expected format: '<number>' or '<name>:<number>' " + "(e.g., '1' or 'tesla:2')" + ) + gres_string = f" --gpus={job.resources.gpu}"📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.if job.resources.get("gpu_model") and job.resources.get("gpu"): # validate GPU model format if not gpu_model_re.match(job.resources.gpu_model): raise WorkflowError( f"Invalid GPU model format: {job.resources.gpu_model}. " "Expected format: '<name>' " "(e.g., 'tesla')" ) gres_string = f" --gpus={job.resources.gpu_model}:{job.resources.gpu}" elif job.resources.get("gpu"): # validate GPU format; we assume here, that the validator ensures that the 'gpu' # is either an integer or follows the format '<number>' or '<name>:<number>' if not gpus_re.match(str(job.resources.gpu)): raise WorkflowError( f"Invalid GPU format: {job.resources.gpu}. " "Expected format: '<number>' or '<name>:<number>' " "(e.g., '1' or 'tesla:2')" ) gres_string = f" --gpus={job.resources.gpu}"
docs/further.md (1)
96-97: 💡 Verification agent
🧩 Analysis chain
Resolve inconsistency between documentation and implementation.
There's a contradiction regarding GPU resource naming. While the documentation suggests using the singular form
gpu
, the code implementation and retrieved learnings indicate that the plural formgpus
should be used to match SLURM's interface.-alternatively, the resource `gpus` can be requested, e.g. `gpu=2`. This can be combined with `gpu_model`. +alternatively, the resource `gpus` can be requested, e.g. `gpus=2`. This can be combined with `gpu_model`.Let's verify the actual implementation in the code:
🏁 Script executed:
#!/bin/bash # Check how GPU resources are referenced in the codebase grep -r "resources\.get(.*gpu" --include="*.py" .Length of output: 779
Action Required: Update Documentation to Use the Plural GPU Resource
The verification confirms that the implementation references both
"gpu"
and"gpus"
, with the plural form ("gpus"
) being used to generate the SLURM flag (--gpus=a100:2
). For clarity and consistency with SLURM’s interface, the documentation should be updated to request the GPU resource using the plural form. Please update the example indocs/further.md
(lines 96–97) as follows:
Before:
alternatively, the Snakemake resource
gpucan be used, e.g. by just requesting the number of GPUs like
gpu=2...
After:
alternatively, the resource
gpuscan be requested, e.g.
gpus=2. This can be combined with
gpu_model...
-alternatively, the resource `gpus` can be requested, e.g. `gpu=2`. This can be combined with `gpu_model`. +alternatively, the resource `gpus` can be requested, e.g. `gpus=2`. This can be combined with `gpu_model`.This change will resolve the inconsistency between the documentation and the implementation.
📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.-alternatively, the resource `gpus` can be requested, e.g. `gpu=2`. This can be combined with `gpu_model`. +alternatively, the resource `gpus` can be requested, e.g. `gpus=2`. This can be combined with `gpu_model`.
🧰 Tools
🪛 LanguageTool
[typographical] ~97-~97: Two consecutive dots
Context: ...makegpu
resource has to be number. .. note:: Internally, Snakemake knows the ...(DOUBLE_PUNCTUATION)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Caution
Inline review comments failed to post. This is likely due to GitHub's limits when posting large numbers of comments. If you are seeing this consistently it is likely a permissions issue. Please check "Moderation" -> "Code review limits" under your organization settings.
Actionable comments posted: 4
♻️ Duplicate comments (2)
snakemake_executor_plugin_slurm/utils.py (2)
4-9
:⚠️ Potential issueMissing import for WorkflowError.
The new function uses
WorkflowError
at lines 67, 76, 79, 87, 90, 99, and 106, but it is not imported. This will cause runtime exceptions when these error scenarios are triggered.Add the following import at the top of the file:
import re from pathlib import Path +from snakemake.exceptions import WorkflowError
73-83
:⚠️ Potential issueDuplicate GPU resource checks causing logical conflicts.
There are two
if job.resources.get("gpu")
blocks (lines 73-83 and 84-95) with different validation logic. This creates conflicts where both blocks might execute and overwrite each other'sgres_string
value.Consider removing this first block and consolidating the GPU resource validation in a single section:
- if job.resources.get("gpu"): - # ensure that gres is not set, if gpu and gpu_model are set - if job.resources.get("gres"): - raise WorkflowError("GRES and GPU are set. Please only set one of them.") - # ensure that 'gpu' is an integer - if not isinstance(job.resources.gpu, int): - raise WorkflowError( - "The 'gpu' resource must be an integer. " - f"Got: {job.resources.gpu} ({type(job.resources.gpu)})." - ) - gres_string = f" --gpus={job.resources.gpu}"
🧹 Nitpick comments (1)
snakemake_executor_plugin_slurm/utils.py (1)
105-108
: Update error message for consistency.Based on the project conventions as mentioned in the learnings, it would be more consistent to use "gpus" (plural) instead of "gpu" (singular) in the error message.
- "GPU model is set, but no GPU number is given. " "Please set 'gpu' as well." + "GPU model is set, but no GPU number is given. Please set 'gpus' as well."
🛑 Comments failed to post (4)
snakemake_executor_plugin_slurm/utils.py (4)
50-109: 🛠️ Refactor suggestion
Refactor the entire function for better structure and error handling.
The current implementation has several issues including duplicate code, unreachable conditions, and potential runtime errors. Consider a more structured approach that clearly defines the mutual exclusivity of resource types.
Here's a complete refactoring suggestion:
def set_gres_string(job: JobExecutorInterface) -> str: """ Function to set the gres string for the SLURM job based on the resources requested in the job. """ # generic resources (GRES) arguments can be of type # "string:int" or "string:string:int" gres_re = re.compile(r"^[a-zA-Z0-9_]+(:[a-zA-Z0-9_]+)?:\d+$") # gpu model arguments can be of type "string" gpu_model_re = re.compile(r"^[a-zA-Z0-9_]+$") # gpus can be of type "int" or "string:int" gpus_re = re.compile(r"^\d+$|^[a-zA-Z0-9_]+:\d+$") + gres_string = "" # Check if GRES is specified if job.resources.get("gres"): # Validate GRES format gres = job.resources.gres if not gres_re.match(gres): raise WorkflowError( f"Invalid GRES format: {gres}. Expected format: " "'<name>:<number>' or '<name>:<type>:<number>' " "(e.g., 'gpu:1' or 'gpu:tesla:2')" ) # Ensure mutual exclusivity with GPU resources if job.resources.get("gpu") or job.resources.get("gpus") or job.resources.get("gpu_model"): raise WorkflowError("GRES and GPU resources are set. Please only use one approach.") gres_string = f" --gres={gres}" # Check if GPU resources are specified elif job.resources.get("gpu_model") and (job.resources.get("gpu") or job.resources.get("gpus")): # Get the GPU value (prefer gpus over gpu for consistency) gpu_value = job.resources.get("gpus", job.resources.get("gpu")) gpu_model = job.resources.gpu_model # Validate GPU model format if not gpu_model_re.match(gpu_model): raise WorkflowError( f"Invalid GPU model format: {gpu_model}. " "Expected format: '<name>' (e.g., 'tesla')" ) gres_string = f" --gpus={gpu_model}:{gpu_value}" # Check if only GPU number is specified elif job.resources.get("gpu") or job.resources.get("gpus"): # Get the GPU value (prefer gpus over gpu for consistency) gpu_value = job.resources.get("gpus", job.resources.get("gpu")) # Validate GPU format if not gpus_re.match(str(gpu_value)): raise WorkflowError( f"Invalid GPU format: {gpu_value}. " "Expected format: '<number>' or '<name>:<number>' " "(e.g., '1' or 'tesla:2')" ) gres_string = f" --gpus={gpu_value}" # Check if only GPU model is specified (error case) elif job.resources.get("gpu_model"): raise WorkflowError( "GPU model is set, but no GPU number is given. Please set 'gpus' as well." ) return gres_string
This refactoring:
- Initializes
gres_string
at the beginning- Clearly enforces mutual exclusivity between GRES and GPU resources
- Prioritizes checking more specific conditions first
- Standardizes on "gpus" (plural) while still supporting "gpu" (singular)
- Removes duplicate code and unreachable conditions
- Fixes the format string syntax
📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.def set_gres_string(job: JobExecutorInterface) -> str: """ Function to set the gres string for the SLURM job based on the resources requested in the job. """ # generic resources (GRES) arguments can be of type # "string:int" or "string:string:int" gres_re = re.compile(r"^[a-zA-Z0-9_]+(:[a-zA-Z0-9_]+)?:\d+$") # gpu model arguments can be of type "string" gpu_model_re = re.compile(r"^[a-zA-Z0-9_]+$") # gpus can be of type "int" or "string:int" gpus_re = re.compile(r"^\d+$|^[a-zA-Z0-9_]+:\d+$") gres_string = "" # Check if GRES is specified if job.resources.get("gres"): # Validate GRES format gres = job.resources.gres if not gres_re.match(gres): raise WorkflowError( f"Invalid GRES format: {gres}. Expected format: " "'<name>:<number>' or '<name>:<type>:<number>' " "(e.g., 'gpu:1' or 'gpu:tesla:2')" ) # Ensure mutual exclusivity with GPU resources if job.resources.get("gpu") or job.resources.get("gpus") or job.resources.get("gpu_model"): raise WorkflowError("GRES and GPU resources are set. Please only use one approach.") gres_string = f" --gres={gres}" # Check if GPU resources are specified with a model elif job.resources.get("gpu_model") and (job.resources.get("gpu") or job.resources.get("gpus")): # Get the GPU value (prefer gpus over gpu for consistency) gpu_value = job.resources.get("gpus", job.resources.get("gpu")) gpu_model = job.resources.gpu_model # Validate GPU model format if not gpu_model_re.match(gpu_model): raise WorkflowError( f"Invalid GPU model format: {gpu_model}. " "Expected format: '<name>' (e.g., 'tesla')" ) gres_string = f" --gpus={gpu_model}:{gpu_value}" # Check if only GPU number is specified elif job.resources.get("gpu") or job.resources.get("gpus"): # Get the GPU value (prefer gpus over gpu for consistency) gpu_value = job.resources.get("gpus", job.resources.get("gpu")) # Validate GPU format if not gpus_re.match(str(gpu_value)): raise WorkflowError( f"Invalid GPU format: {gpu_value}. " "Expected format: '<number>' or '<name>:<number>' " "(e.g., '1' or 'tesla:2')" ) gres_string = f" --gpus={gpu_value}" # Check if only GPU model is specified (error case) elif job.resources.get("gpu_model"): raise WorkflowError( "GPU model is set, but no GPU number is given. Please set 'gpus' as well." ) return gres_string
109-109: 🛠️ Refactor suggestion
Simplify the return statement.
The current return statement will error if neither
gres
norgpu
is set becausegres_string
won't be initialized. Since you've already setgres_string
conditionally in each relevant block, you can simply return it.- return gres_string if job.resources.get("gres") or job.resources.get("gpu") else "" + return gres_stringAlso, consider initializing
gres_string = ""
at the beginning of the function to ensure it's always defined.📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.def set_gres_string(job: JobExecutorInterface) -> str: # Ensure gres_string is always defined gres_string = "" # Examples of conditionally setting gres_string: if job.resources.get("gres"): gres = job.resources.gres # (Assume additional validations for gres format here) gres_string = f" --gres={gres}" elif job.resources.get("gpu"): # (Assume additional validations for gpu format here) # Optionally handle gpu_model if available if job.resources.get("gpu_model"): gpu_model = job.resources.gpu_model gpu_value = job.resources.gpu # (Assume validation of gpu_model) gres_string = f" --gpus={gpu_model}:{gpu_value}" else: gpu_string = job.resources.gpu # (Assume validation of gpu_string) gres_string = f" --gpus={gpu_string}" # Simplified return statement return gres_string
96-104:
⚠️ Potential issueUnreachable code block and incorrect format string.
This block will never be executed because the condition at line 84 (
if job.resources.get("gpu")
) will always trigger first when bothgpu
andgpu_model
are set. Additionally, the format string uses a colon instead of an equals sign after--gpus
.Rearrange the conditions to check for both
gpu_model
andgpu
first and fix the format string:- if job.resources.get("gpu"): + if job.resources.get("gpu_model") and job.resources.get("gpu"): + # validate GPU model format + if not gpu_model_re.match(job.resources.gpu_model): + raise WorkflowError( + f"Invalid GPU model format: {job.resources.gpu_model}. " + "Expected format: '<name>' " + "(e.g., 'tesla')" + ) + gres_string = f" --gpus={job.resources.gpu_model}:{job.resources.gpu}" + elif job.resources.get("gpu"): # validate GPU format # ... existing code ... - elif job.resources.get("gpu_model") and job.resources.get("gpu"): - # validate GPU model format - if not gpu_model_re.match(job.resources.gpu_model): - raise WorkflowError( - f"Invalid GPU model format: {job.resources.gpu_model}. " - "Expected format: '<name>' " - "(e.g., 'tesla')" - ) - gres_string = f" --gpus:{job.resources.gpu_model}:{job.resources.gpu}"📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.if job.resources.get("gpu_model") and job.resources.get("gpu"): # validate GPU model format if not gpu_model_re.match(job.resources.gpu_model): raise WorkflowError( f"Invalid GPU model format: {job.resources.gpu_model}. " "Expected format: '<name>' (e.g., 'tesla')" ) gres_string = f" --gpus={job.resources.gpu_model}:{job.resources.gpu}" elif job.resources.get("gpu"): # we assume here that the validator ensures that the 'gpu' # is an integer gres_string = f" --gpus={job.resources.gpu}" elif job.resources.get("gpu_model"): raise WorkflowError("GPU model is set, but no GPU number is given")
84-95:
⚠️ Potential issueEnsure
gres_string
is initialized at the beginning.If neither
gres
norgpu
resources are set,gres_string
will be uninitialized, causing runtime errors when it's accessed in the return statement. Also, consider standardizing on "gpus" (plural) according to the project conventions.Apply this modification:
+ gres_string = "" if job.resources.get("gres"): # Validate GRES format... gres_string = f" --gres={job.resources.gres}" - if job.resources.get("gpu"): + elif job.resources.get("gpus") or job.resources.get("gpu"): # ensure that gres is not set... # validate GPU format... gres_string = f" --gpus={job.resources.gpu or job.resources.gpus}"📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements."'<name>:<number>' or '<name>:<type>:<number>' " "(e.g., 'gpu:1' or 'gpu:tesla:2')" ) + gres_string = "" - return f" --gres={job.resources.gres}" + if job.resources.get("gres"): + # Validate GRES format (e.g., "gpu:1" or "gpu:tesla:2") + gres_string = f" --gres={job.resources.gres}" + elif job.resources.get("gpus") or job.resources.get("gpu"): + # Validate GPU format and ensure that gres is not set + gres_value = job.resources.gpu or job.resources.gpus + gres_string = f" --gpus={gres_value}" elif gpu_model and gpu_string: # validate GPU model format if not gpu_model_re.match(gpu_model): raise WorkflowError( f"Invalid GPU model format: {gpu_model}." " Expected format: '<name>' (e.g., 'tesla')" ) + gres_string = f" --gpus={gpu_model}:{gpu_string}" + return gres_string
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Caution
Inline review comments failed to post. This is likely due to GitHub's limits when posting large numbers of comments. If you are seeing this consistently it is likely a permissions issue. Please check "Moderation" -> "Code review limits" under your organization settings.
Actionable comments posted: 2
🔭 Outside diff range comments (1)
snakemake_executor_plugin_slurm/utils.py (1)
98-105
:⚠️ Potential issueFix critical syntax error in overlapping code blocks
There's a critical syntax error in this portion of the code where overlapping code blocks have created invalid Python syntax. The block at lines 99-101 appears to be duplicated from above and creates a conflict with the new error condition at lines 102-105.
Apply this fix to remove the duplicated code:
- elif gpu_model and gpu_string: - # validate GPU model format - if not gpu_model_re.match(gpu_model): - raise WorkflowError( - f"Invalid GPU model format: {gpu_model}. Expected format: '<name>' (e.g., 'tesla')" - ) - return f" --gpus={gpu_model}:{gpu_string}" elif gpu_model and not gpu_string: raise WorkflowError( "A gpu model has been requested without specifying a gpu request." )
🧹 Nitpick comments (1)
snakemake_executor_plugin_slurm/utils.py (1)
102-105
: Improve error message consistency with resource namingThe error message should be consistent with the resource naming convention used in SLURM. According to the retrieved learnings, the SLURM flag is
--gpus
(plural), so the error message should reference "gpus" rather than "gpu" for consistency.elif gpu_model and not gpu_string: raise WorkflowError( - "A gpu model has been requested without specifying a gpu request." + "GPU model is set, but no GPU number is given. Please set 'gpus' as well." )This message matches the style of other error messages in the function and clearly indicates what resource name the user should set.
🛑 Comments failed to post (2)
tests/tests.py (2)
113-119:
⚠️ Potential issueInconsistency between test expectation and function error message
There's a mismatch between the error message string that this test is expecting and the actual error message in the
set_gres_string
function. The test expects "GPU model is set, but no GPU number is given", but the function returns "A gpu model has been requested without specifying a gpu request."Either update the test to match the function's actual error message, or update the function to match the test expectation. Since I've suggested improving the error message in the function, updating it to match this test expectation would be the best solution.
# In utils.py: elif gpu_model and not gpu_string: raise WorkflowError( - "A gpu model has been requested without specifying a gpu request." + "GPU model is set, but no GPU number is given. Please set 'gpus' as well." )📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.elif gpu_model and not gpu_string: raise WorkflowError( - "A gpu model has been requested without specifying a gpu request." + "GPU model is set, but no GPU number is given. Please set 'gpus' as well." )
1-9:
⚠️ Potential issueMissing pytest import
The test class uses pytest features like fixtures and
pytest.raises
, but there's no import statement for pytest.from typing import Optional import snakemake.common.tests from snakemake_interface_executor_plugins.settings import ExecutorSettingsBase +import pytest from unittest.mock import MagicMock from snakemake_executor_plugin_slurm import ExecutorSettings from snakemake_executor_plugin_slurm.utils import set_gres_string from snakemake_interface_common.exceptions import WorkflowError
📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.from typing import Optional import snakemake.common.tests from snakemake_interface_executor_plugins.settings import ExecutorSettingsBase import pytest from unittest.mock import MagicMock from snakemake_executor_plugin_slurm import ExecutorSettings from snakemake_executor_plugin_slurm.utils import set_gres_string from snakemake_interface_common.exceptions import WorkflowError
@coderabbitai - I do not see the critical code duplication you mention in your comment - is this assessment of yours up-to-date? |
Tip For best results, initiate chat on the files or code changes. 🧩 Analysis chain@cmeesters I apologize for the confusion. After reviewing the comment history on this PR, I don't recall making any comment about "critical code duplication" in your code. It's possible there's been a misunderstanding or perhaps the comment was made by someone else or in a different context. If you can point me to the specific comment you're referring to, I'd be happy to clarify or provide an updated assessment of the code. From what I can see, your PR is focused on adding 🏁 Script executed:
Length of output: 20847 @cmeesters Upon checking the PR comments and reviewing the codebase, I can see there were several automated review comments regarding code duplication in the Since you've made changes to the code and are now asking if my assessment is up-to-date, I should clarify that the automated review comments may not reflect the current state of your code. Let me take a fresh look at the current implementation to provide an updated assessment. Could you please confirm which specific comment about "critical code duplication" you're referring to? This would help me address your question more accurately. 🏁 Script executed:
Length of output: 2893 @cmeesters You're right to question this. Looking at the current implementation of If there were automated review comments from me about code duplication in earlier versions of this PR, they no longer apply to the current code. You've clearly addressed any previous structural issues, and the current implementation looks clean and efficient. I apologize for any confusion caused by outdated comments that may not have been properly updated as your code evolved. 🏁 Script executed:
Length of output: 4725 @cmeesters After reviewing the current code in My previous automated review comments about code duplication were likely referring to an earlier version of the code. You've done a good job refactoring and improving the implementation, which has addressed those concerns. Thank you for checking on this and for your work on improving the code structure. The current implementation looks clean and efficient. 🧠 Learnings used
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Caution
Inline review comments failed to post. This is likely due to GitHub's limits when posting large numbers of comments. If you are seeing this consistently it is likely a permissions issue. Please check "Moderation" -> "Code review limits" under your organization settings.
Actionable comments posted: 4
♻️ Duplicate comments (4)
snakemake_executor_plugin_slurm/utils.py (3)
99-99
:⚠️ Potential issueIncorrect variable reference
The function incorrectly references
job.resources.gpu_model
when it should use the localgpu_model
variable.- if not gpu_model_re.match(job.resources.gpu_model): + if not gpu_model_re.match(gpu_model):
73-83
:⚠️ Potential issueDuplicate GPU checking logic
There's a duplicate block of code that checks for
job.resources.get("gpu")
at both lines 73 and 84, creating overlapping logic paths.- if job.resources.get("gpu"): - # ensure that gres is not set, if gpu and gpu_model are set - if job.resources.get("gres"): - raise WorkflowError("GRES and GPU are set. Please only set one of them.") - # ensure that 'gpu' is an integer - if not isinstance(job.resources.gpu, int): - raise WorkflowError( - "The 'gpu' resource must be an integer. " - f"Got: {job.resources.gpu} ({type(job.resources.gpu)})." - ) - gres_string = f" --gpus={job.resources.gpu}"
104-104
:⚠️ Potential issueIncorrect SLURM flag formatting
The SLURM GPU flag should use an equals sign rather than a colon after
--gpus
.- gres_string = f" --gpus:{gpu_model}:{gpu_string}" + gres_string = f" --gpus={gpu_model}:{gpu_string}"docs/further.md (1)
108-108
: 🛠️ Refactor suggestionInconsistent GPU resource naming in example
The YAML example uses
"gpus": 2
but earlier documentation usesgpu=2
, creating confusion.Use the same resource key consistently:
- gpus: 2 + gpus: 2 # match the resource name used in documentation
🧹 Nitpick comments (1)
docs/further.md (1)
98-98
: Clarify cpus_per_gpu behaviorThe documentation about
cpus_per_gpu
is helpful, but it would be beneficial to provide an example of when setting it to a negative value might be useful.Consider adding a brief explanation of why a user might want to set
cpus_per_gpu
to a negative value, such as when using specialized software that manages its own CPU allocation.
🛑 Comments failed to post (4)
snakemake_executor_plugin_slurm/utils.py (3)
84-95:
⚠️ Potential issueRemove or consolidate conflicting GPU validation logic
This block conflicts with the earlier GPU validation at lines 73-83, and uses a different validation approach (regex vs. isinstance).
Consider consolidating these checks into a single, comprehensive validation approach.
109-109:
⚠️ Potential issueMissing initialization for gres_string
The final return statement uses
gres_string
without ensuring it's always initialized.Initialize
gres_string
to an empty string at the beginning of the function:def set_gres_string(job: JobExecutorInterface) -> str: # ...existing code... + gres_string = "" if job.resources.get("gres"): # ...
📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.def set_gres_string(job: JobExecutorInterface) -> str: # ...existing code... gres_string = "" if job.resources.get("gres"): # Validate GRES format and construct the gres string. gres = job.resources.gres # Assume there is a regex check here, e.g.: # if not gres_re.match(gres): # raise WorkflowError(f"Invalid GRES format: {gres}") gres_string = f" --gres={gres}" elif job.resources.get("gpu"): # Example branch for gpu resource # Additional validation and formatting might be applied here. gres_string = f" --gpus={job.resources.gpu}" # ...potential further conditions... return gres_string
96-104:
⚠️ Potential issueUnreachable code block
This code block will never execute because when both
gpu_model
andgpu_string
are set, the previous condition at line 84 (if gpu_string
) will always trigger first.- elif gpu_model and gpu_string: - # validate GPU model format - if not gpu_model_re.match(job.resources.gpu_model): - raise WorkflowError( - f"Invalid GPU model format: {job.resources.gpu_model}. " - "Expected format: '<name>' " - "(e.g., 'tesla')" - ) - gres_string = f" --gpus:{gpu_model}:{gpu_string}"To fix this issue, reorder the conditions to check for both GPU model and GPU string first:
+ if gpu_model and gpu_string: + # validate GPU model format + if not gpu_model_re.match(gpu_model): + raise WorkflowError( + f"Invalid GPU model format: {gpu_model}. " + "Expected format: '<name>' " + "(e.g., 'tesla')" + ) + gres_string = f" --gpus={gpu_model}:{gpu_string}" + elif gpu_string: # validate GPU format if not gpus_re.match(gpu_string): raise WorkflowError( f"Invalid GPU format: {gpu_string}. " "Expected format: '<number>' or '<name>:<number>' " "(e.g., '1' or 'tesla:2')" ) gres_string = f" --gpus={gpu_string}" - elif gpu_model and gpu_string: - # validate GPU model format - if not gpu_model_re.match(job.resources.gpu_model): - raise WorkflowError( - f"Invalid GPU model format: {job.resources.gpu_model}. " - "Expected format: '<name>' " - "(e.g., 'tesla')" - ) - gres_string = f" --gpus:{gpu_model}:{gpu_string}"📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.if gpu_model and gpu_string: # validate GPU model format if not gpu_model_re.match(gpu_model): raise WorkflowError( f"Invalid GPU model format: {gpu_model}. " "Expected format: '<name>' (e.g., 'tesla')" ) gres_string = f" --gpus={gpu_model}:{gpu_string}" elif gpu_model and not gpu_string: raise WorkflowError("GPU model is set, but no GPU number is given") elif gpu_string: # we assume here, that the validator ensures that the 'gpu_string' # is an integer if not gpus_re.match(gpu_string): raise WorkflowError( f"Invalid GPU format: {gpu_string}. " "Expected format: '<number>' or '<name>:<number>' (e.g., '1' or 'tesla:2')" ) gres_string = f" --gpus={gpu_string}"
docs/further.md (1)
93-97: 🛠️ Refactor suggestion
Inconsistency in GPU resource naming
There's an inconsistency in the documentation regarding the GPU resource naming. Line 96 mentions "alternatively, the resource
gpus
can be requested, e.g.gpu=2
", mixing singular and plural forms.Choose a consistent naming approach throughout the documentation:
- - alternatively, the resource `gpus` can be requested, e.g. `gpu=2`. This can be combined with `gpu_model`. + - alternatively, the resource `gpus` can be requested, e.g. `gpus=2`. This can be combined with `gpu_model`.According to the retrieved learnings, the GPU resource is specified using "gpus" (plural) in the Snakemake executor plugin for SLURM.
📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.SLURM allows to specify GPU request with the `--gres` or `--gpus` flags and Snakemake takes a similar approach. Resources can be asked for with - The resource `gpu` can be used, e.g. by just requesting the number of GPUs like `gpu=2`. This can be combined with the `gpu_model` resource, i.e. `gpu_model=a100` or independently. The combination will result in a flag to `sbatch` like `--gpus=a100:2`. The Snakemake `gpu` resource has to be number. - alternatively, the resource `gpus` can be requested, e.g. `gpus=2`. This can be combined with `gpu_model`. - Alternatively, the resource `gres`, the syntax is `<string>:<number>` or `<string>:<model>:<number>`, i.e. `gres=gpu:1` or `gres=gpu:a100:2` (assuming GPU model).
🧰 Tools
🪛 LanguageTool
[grammar] ~93-~93: Did you mean “specifying”? Or maybe you should add a pronoun? In active voice, ‘allow’ + ‘to’ takes an object, usually a pronoun.
Context: ...tribution. ### GPU Jobs SLURM allows to specify GPU request with the
--gresor
--gpu...(ALLOW_TO)
[uncategorized] ~93-~93: Possible missing comma found.
Context: ...U request with the--gres
or--gpus
flags and Snakemake takes a similar approach....(AI_HYDRA_LEO_MISSING_COMMA)
[uncategorized] ~96-~96: Possible missing preposition found.
Context: ...rce has to be number. - Alternatively, the resourcegres
, the syntax is `<string...(AI_HYDRA_LEO_MISSING_IN)
[typographical] ~97-~97: Two consecutive dots
Context: ...gres=gpu:a100:2` (assuming GPU model). .. note:: Internally, Snakemake knows the ...(DOUBLE_PUNCTUATION)
🤖 I have created a release *beep* *boop* --- ## [1.0.0](v0.15.1...v1.0.0) (2025-03-12) ### ⚠ BREAKING CHANGES * improved GPU job support ([#173](#173)) ### Features * improved GPU job support ([#173](#173)) ([66dcdcf](66dcdcf)) ### Bug Fixes * another dummy commit to test release-please PR CI behaviour ([#230](#230)) ([791ed16](791ed16)) * logdir misconstruct when leading slash in wildcard ([#220](#220)) ([61de847](61de847)) --- This PR was generated with [Release Please](https://github.com/googleapis/release-please). See [documentation](https://github.com/googleapis/release-please#release-please). Co-authored-by: snakemake-bot <[email protected]>
In the light of more and more accelerator applications (AI, base mapping, ...) the fall-back onto
slurm_extra
becomes a bit tedious to use. Hence, the resource support forgres
.Addresses issue #52 (and to a minor extent: #18 and #104). Supersedes PR #172 .
Summary by CodeRabbit
New Features
Bug Fixes
Dependency Updates
snakemake-executor-plugin-slurm-jobstep
dependency version from^0.2.0
to^0.3.0
.Tests