Skip to content

Conversation

cademirch
Copy link
Contributor

@cademirch cademirch commented Aug 7, 2025

Fixes #337.

This PR makes sure for each job in sacct output, only the jobstep is reported and that each record has rulename and memrequested. Also refactors efficiency report slightly to allow unit testing of parsing sacct output.

Summary by CodeRabbit

  • Refactor

    • Split the efficiency report into separate data-fetching and parsing stages for more reliable SLURM data handling.
    • Improved handling of job steps with inheritance of job metadata (rule names, requested memory) and robust CPU/memory efficiency calculations with low-efficiency warnings.
  • Tests

    • Added a test verifying SLURM output parsing, metadata inheritance for job steps, and correct memory/efficiency reporting.

…ightly to allow unit testing of parsing sacct output.
Copy link
Contributor

coderabbitai bot commented Aug 7, 2025

Walkthrough

The efficiency-report logic was split into data fetching and parsing: get_sacct_data fetches raw sacct lines; parse_sacct_data parses, collapses job steps, inherits metadata, computes CPU/memory efficiencies; create_efficiency_report now orchestrates these steps. A unit test for parsing/collapsing was added.

Changes

Cohort / File(s) Change Summary
Efficiency report refactor
snakemake_executor_plugin_slurm/efficiency_report.py
Added get_sacct_data(run_uuid, logger) to run sacct --parsable2 and return raw lines; added parse_sacct_data(lines, e_threshold, run_uuid, logger) to parse lines into a DataFrame, rename Comment→RuleName when present, drop/ignore batch/extern rows, derive MainJobID, split main jobs vs steps, inherit RuleName and RequestedMem_MB from main jobs into steps, compute CPU Efficiency (%) and Memory Usage (%), handle infinities, and log low-efficiency warnings; create_efficiency_report now orchestrates these functions and preserves CSV output behavior.
Unit test addition
tests/tests.py
Added test_parse_sacct_data() which supplies sample sacct lines, calls parse_sacct_data(...), prints resulting CSV, and asserts that job steps are collapsed into two rows, RuleName and RequestedMem_MB are inherited, and MaxRSS_MB on steps is > 0.

Sequence Diagram(s)

sequenceDiagram
    participant Caller
    participant create_efficiency_report
    participant get_sacct_data
    participant parse_sacct_data
    Note right of get_sacct_data #eef2ff: runs `sacct --parsable2`, returns lines or None
    Note right of parse_sacct_data #f7f7e6: parses lines, collapses steps, computes efficiencies

    Caller->>create_efficiency_report: call(e_threshold, run_uuid, e_report_path, logger)
    create_efficiency_report->>get_sacct_data: get_sacct_data(run_uuid, logger)
    get_sacct_data-->>create_efficiency_report: lines or None
    alt lines present
        create_efficiency_report->>parse_sacct_data: parse_sacct_data(lines, e_threshold, run_uuid, logger)
        parse_sacct_data-->>create_efficiency_report: DataFrame (steps collapsed, efficiencies)
        create_efficiency_report-->>Caller: DataFrame saved as CSV (report)
    else no data
        create_efficiency_report-->>Caller: None
    end
Loading

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~25 minutes

Assessment against linked issues

Objective Addressed Explanation
Collapse SLURM job records so job steps inherit metadata (rule name, memory request, etc.) (#337)
Filter out batch and extern jobs, keeping main jobs and job steps for reporting (#337)
Compute CPU and memory efficiency from parsed fields and avoid infinite/zero artifacts (#337)
Add tests to verify collapsing and inheritance behavior (#337)

Assessment against linked issues: Out-of-scope changes

No out-of-scope changes found.

Possibly related PRs

Suggested reviewers

  • johanneskoester
  • dlaehnemann

Poem

"I thumped through sacct lines, a rabbit on the run,
I stitched main jobs to steps until the parsing was done.
Rules hop to their steps, memory numbers align,
CSV carrots gathered—efficiency tastes fine.
🐇✨"

Tip

🔌 Remote MCP (Model Context Protocol) integration is now available!

Pro plan users can now connect to remote MCP servers from the Integrations page. Connect with popular remote MCPs such as Notion and Linear to add more context to your reviews and chats.


📜 Recent review details

Configuration used: .coderabbit.yaml
Review profile: CHILL
Plan: Pro

💡 Knowledge Base configuration:

  • MCP integration is disabled by default for public repositories
  • Jira integration is disabled by default for public repositories
  • Linear integration is disabled by default for public repositories

You can enable these sources in your CodeRabbit configuration.

📥 Commits

Reviewing files that changed from the base of the PR and between c2582bf and 37c8a73.

📒 Files selected for processing (1)
  • tests/tests.py (2 hunks)
🚧 Files skipped from review as they are similar to previous changes (1)
  • tests/tests.py
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (1)
  • GitHub Check: testing
✨ Finishing Touches
  • 📝 Generate Docstrings
🧪 Generate unit tests
  • Create PR with unit tests
  • Post copyable unit tests in a comment
  • Commit unit tests in branch fix/efficiency_report

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share
🪧 Tips

Chat

There are 3 ways to chat with CodeRabbit:

  • Review comments: Directly reply to a review comment made by CodeRabbit. Example:
    • I pushed a fix in commit <commit_id>, please review it.
    • Open a follow-up GitHub issue for this discussion.
  • Files and specific lines of code (under the "Files changed" tab): Tag @coderabbitai in a new review comment at the desired location with your query.
  • PR comments: Tag @coderabbitai in a new PR comment to ask questions about the PR branch. For the best results, please provide a very specific query, as very limited context is provided in this mode. Examples:
    • @coderabbitai gather interesting stats about this repository and render them as a table. Additionally, render a pie chart showing the language distribution in the codebase.
    • @coderabbitai read the files in the src/scheduler package and generate a class diagram using mermaid and a README in the markdown format.

Support

Need help? Create a ticket on our support page for assistance with any issues or questions.

CodeRabbit Commands (Invoked using PR/Issue comments)

Type @coderabbitai help to get the list of available commands.

Other keywords and placeholders

  • Add @coderabbitai ignore anywhere in the PR description to prevent this PR from being reviewed.
  • Add @coderabbitai summary to generate the high-level summary at a specific location in the PR description.
  • Add @coderabbitai anywhere in the PR title to generate the title automatically.

Status, Documentation and Community

  • Visit our Status Page to check the current availability of CodeRabbit.
  • Visit our Documentation for detailed information on how to use CodeRabbit.
  • Join our Discord Community to get help, request features, and share feedback.
  • Follow us on X/Twitter for updates and announcements.

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

🔭 Outside diff range comments (1)
snakemake_executor_plugin_slurm/efficiency_report.py (1)

7-7: Remove temporary debugging code from production

The comment indicates this os import is temporary. Based on previous learnings, debugging code that logs directory contents should be removed from production implementation.

Apply this diff to remove the debugging code:

-import os  # only temporarily needed for printf debugging
 import numpy as np
     # write out the efficiency report at normal verbosity in any case
     logger.info(f"Efficiency report for workflow {run_uuid} saved to {logfile}.")
-    # state directory contents for debugging purposes
-    logger.debug(f"Current directory contents in '{p.cwd()}': {os.listdir(p.cwd())}")

Also applies to: 216-216

🧹 Nitpick comments (1)
snakemake_executor_plugin_slurm/efficiency_report.py (1)

194-195: Consider handling empty results separately from errors

The condition lines is None or not lines treats both error cases (None) and empty results (empty list) the same way. An empty list might be a valid result from sacct (e.g., if jobs haven't started yet or have been purged). Consider distinguishing between these cases for better error reporting.

-    if lines is None or not lines:
-        return None
+    if lines is None:
+        return None  # Error occurred
+    
+    if not lines:
+        logger.info(f"No job data available for workflow {run_uuid} yet.")
+        return None  # Valid but empty result
📜 Review details

Configuration used: .coderabbit.yaml
Review profile: CHILL
Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between d8d605c and b290aaa.

📒 Files selected for processing (2)
  • snakemake_executor_plugin_slurm/efficiency_report.py (4 hunks)
  • tests/tests.py (2 hunks)
🧰 Additional context used
🧠 Learnings (9)
📓 Common learnings
Learnt from: cmeesters
PR: snakemake/snakemake-executor-plugin-slurm#0
File: :0-0
Timestamp: 2025-01-13T09:54:22.950Z
Learning: PR #173 (adding gres resource specification) depends on PR #28 in snakemake-executor-plugin-slurm-jobstep repository, as changes were required in the cpu-settings function of the jobstep-Executor module.
Learnt from: cmeesters
PR: snakemake/snakemake-executor-plugin-slurm#221
File: snakemake_executor_plugin_slurm/efficiency_report.py:34-49
Timestamp: 2025-05-26T12:22:07.005Z
Learning: In the parse_reqmem function in snakemake_executor_plugin_slurm/efficiency_report.py, the conditional check for per_unit == "/node" is necessary for proper conversion of memory values in multi-node jobs, not redundant as it might initially appear.
Learnt from: johanneskoester
PR: snakemake/snakemake-executor-plugin-slurm#173
File: docs/further.md:96-96
Timestamp: 2025-03-10T15:20:51.829Z
Learning: PR #173 in snakemake-executor-plugin-slurm implements GPU job support by adding resources: `gres` for generic resource specifications (e.g., 'gpu:1'), `gpu`/`gpus` for specifying GPU counts, and `gpu_model`/`gpu_manufacturer` for specifying GPU types, allowing users to request GPU resources directly rather than having to use slurm_extra.
Learnt from: cmeesters
PR: snakemake/snakemake-executor-plugin-slurm#221
File: snakemake_executor_plugin_slurm/__init__.py:0-0
Timestamp: 2025-06-03T17:10:51.292Z
Learning: In the snakemake-executor-plugin-slurm repository, the create_efficiency_report method in snakemake_executor_plugin_slurm/__init__.py contains temporary testing/debugging code that logs directory contents using os.listdir() and directory paths. This code was inserted for testing purposes only and should be removed from the production implementation.
Learnt from: cmeesters
PR: snakemake/snakemake-executor-plugin-slurm#178
File: snakemake_executor_plugin_slurm/__init__.py:0-0
Timestamp: 2024-12-11T14:17:08.749Z
Learning: In the `snakemake-executor-plugin-slurm` project, when handling exceptions in `snakemake_executor_plugin_slurm/__init__.py`, prefer concise error messages and avoid unnecessary verbosity or exception chaining when it's not necessary.
Learnt from: cmeesters
PR: snakemake/snakemake-executor-plugin-slurm#193
File: .github/workflows/post_to_mastodon.yml:26-26
Timestamp: 2025-01-20T09:13:26.443Z
Learning: In the snakemake-executor-plugin-slurm repository, release PRs follow the naming pattern "chore(main): release X.Y.Z" where X.Y.Z is the version number.
Learnt from: cmeesters
PR: snakemake/snakemake-executor-plugin-slurm#221
File: tests/tests.py:62-70
Timestamp: 2025-06-24T12:41:31.837Z
Learning: In the snakemake-executor-plugin-slurm efficiency report feature, the executor is responsible for creating the efficiency_report_path directory when generating efficiency reports. Tests should not manually check for directory existence but should rely on the executor to properly create necessary directories as part of its normal operation.
Learnt from: cmeesters
PR: snakemake/snakemake-executor-plugin-slurm#187
File: .github/workflows/post_to_mastodon.yml:0-0
Timestamp: 2025-01-17T17:27:32.446Z
Learning: In the mastodon publishing workflow for snakemake-executor-plugin-slurm, the PR_TITLE environment variable is required by the post_to_mastodon.sh script and should be preserved.
📚 Learning: 2025-06-24T12:41:31.837Z
Learnt from: cmeesters
PR: snakemake/snakemake-executor-plugin-slurm#221
File: tests/tests.py:62-70
Timestamp: 2025-06-24T12:41:31.837Z
Learning: In the snakemake-executor-plugin-slurm efficiency report feature, the executor is responsible for creating the efficiency_report_path directory when generating efficiency reports. Tests should not manually check for directory existence but should rely on the executor to properly create necessary directories as part of its normal operation.

Applied to files:

  • tests/tests.py
  • snakemake_executor_plugin_slurm/efficiency_report.py
📚 Learning: 2024-12-11T14:17:08.749Z
Learnt from: cmeesters
PR: snakemake/snakemake-executor-plugin-slurm#178
File: snakemake_executor_plugin_slurm/__init__.py:0-0
Timestamp: 2024-12-11T14:17:08.749Z
Learning: In the `snakemake-executor-plugin-slurm` project, when handling exceptions in `snakemake_executor_plugin_slurm/__init__.py`, prefer concise error messages and avoid unnecessary verbosity or exception chaining when it's not necessary.

Applied to files:

  • tests/tests.py
  • snakemake_executor_plugin_slurm/efficiency_report.py
📚 Learning: 2025-02-18T14:53:09.504Z
Learnt from: cmeesters
PR: snakemake/snakemake-executor-plugin-slurm#173
File: snakemake_executor_plugin_slurm/__init__.py:29-31
Timestamp: 2025-02-18T14:53:09.504Z
Learning: The snakemake-executor-plugin-slurm tests require system dependencies (gcc, python3-dev) to build the datrie package, which is a dependency of snakemake.

Applied to files:

  • tests/tests.py
📚 Learning: 2025-06-03T17:10:51.292Z
Learnt from: cmeesters
PR: snakemake/snakemake-executor-plugin-slurm#221
File: snakemake_executor_plugin_slurm/__init__.py:0-0
Timestamp: 2025-06-03T17:10:51.292Z
Learning: In the snakemake-executor-plugin-slurm repository, the create_efficiency_report method in snakemake_executor_plugin_slurm/__init__.py contains temporary testing/debugging code that logs directory contents using os.listdir() and directory paths. This code was inserted for testing purposes only and should be removed from the production implementation.

Applied to files:

  • tests/tests.py
  • snakemake_executor_plugin_slurm/efficiency_report.py
📚 Learning: 2025-03-31T11:52:05.373Z
Learnt from: cmeesters
PR: snakemake/snakemake-executor-plugin-slurm#249
File: snakemake_executor_plugin_slurm/submit_string.py:21-29
Timestamp: 2025-03-31T11:52:05.373Z
Learning: In the snakemake-executor-plugin-slurm project, account and partition parameters are validated in the submit function before reaching the get_submit_command function. By design, these parameters are either empty strings or already properly formatted with SLURM parameter tags, making additional validation in get_submit_command unnecessary.

Applied to files:

  • tests/tests.py
  • snakemake_executor_plugin_slurm/efficiency_report.py
📚 Learning: 2025-05-26T12:22:07.005Z
Learnt from: cmeesters
PR: snakemake/snakemake-executor-plugin-slurm#221
File: snakemake_executor_plugin_slurm/efficiency_report.py:34-49
Timestamp: 2025-05-26T12:22:07.005Z
Learning: In the parse_reqmem function in snakemake_executor_plugin_slurm/efficiency_report.py, the conditional check for per_unit == "/node" is necessary for proper conversion of memory values in multi-node jobs, not redundant as it might initially appear.

Applied to files:

  • tests/tests.py
  • snakemake_executor_plugin_slurm/efficiency_report.py
📚 Learning: 2025-01-13T09:54:22.950Z
Learnt from: cmeesters
PR: snakemake/snakemake-executor-plugin-slurm#0
File: :0-0
Timestamp: 2025-01-13T09:54:22.950Z
Learning: PR #173 (adding gres resource specification) depends on PR #28 in snakemake-executor-plugin-slurm-jobstep repository, as changes were required in the cpu-settings function of the jobstep-Executor module.

Applied to files:

  • tests/tests.py
📚 Learning: 2025-03-10T15:20:51.829Z
Learnt from: johanneskoester
PR: snakemake/snakemake-executor-plugin-slurm#173
File: docs/further.md:96-96
Timestamp: 2025-03-10T15:20:51.829Z
Learning: PR #173 in snakemake-executor-plugin-slurm implements GPU job support by adding resources: `gres` for generic resource specifications (e.g., 'gpu:1'), `gpu`/`gpus` for specifying GPU counts, and `gpu_model`/`gpu_manufacturer` for specifying GPU types, allowing users to request GPU resources directly rather than having to use slurm_extra.

Applied to files:

  • tests/tests.py
  • snakemake_executor_plugin_slurm/efficiency_report.py
🧬 Code Graph Analysis (1)
tests/tests.py (1)
snakemake_executor_plugin_slurm/efficiency_report.py (1)
  • parse_sacct_data (78-184)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (1)
  • GitHub Check: testing
🔇 Additional comments (2)
snakemake_executor_plugin_slurm/efficiency_report.py (1)

127-149: Well-implemented job step handling logic

The refactored logic correctly addresses the issue by:

  1. Filtering out batch/extern entries that represent SLURM overhead
  2. Separating main jobs from job steps
  3. Inheriting metadata (RuleName, RequestedMem_MB) from main jobs to their corresponding job steps
  4. Using only job steps in the final dataset since they contain actual resource usage

This approach resolves the confusion with multiple entries per job and ensures consistent field population.

tests/tests.py (1)

47-53: Good test coverage for job step handling

The assertions effectively verify the core functionality:

  • Job step collapsing (2 rows from 6 input lines)
  • RuleName inheritance from main jobs to job steps
  • RequestedMem_MB inheritance
  • Actual memory usage captured from job steps

This provides good validation of the refactored parsing logic.

@cmeesters
Copy link
Member

@cademirch thank for this PR!

Considering the other discussion about whether we continue relying exclusively on SLURM with accounting databases, I feel we need a logical switch for people without a DB, in which case the efficiency option should not be present.

Let me add this and update the docs accordingly. With a little luck, I can test and amend some time next week (and also have a closer look on your other PR).

@cmeesters cmeesters removed the request for review from slaube-jgu August 8, 2025 08:04
@cademirch
Copy link
Contributor Author

Minor note here, I think that the efficiency report should be written to the .snakemake directory. Its been cluttering up my workdirs when testing stuff

@cmeesters
Copy link
Member

I think that the efficiency report should be written to the .snakemake directory. Its been cluttering up my workdirs when testing stuff

This is why we have --slurm-efficiency-report-path - we have discussed this in the past. My argument is that users will want to have this in their working path rather than in a hidden one. For testing, I admit, the needs are different.

@cmeesters cmeesters merged commit a4cbe36 into main Aug 21, 2025
6 checks passed
@cmeesters cmeesters deleted the fix/efficiency_report branch August 21, 2025 13:03
cmeesters pushed a commit that referenced this pull request Aug 21, 2025
🤖 I have created a release *beep* *boop*
---


##
[1.6.1](v1.6.0...v1.6.1)
(2025-08-21)


### Bug Fixes

* efficiency report jobsteps
([#338](#338))
([a4cbe36](a4cbe36))
* gpu tasks are unset if &lt;= 0
([#347](#347))
([564e0f7](564e0f7))
* updated poetry
([#343](#343))
([58d471d](58d471d))


### Documentation

* added notes about ntasks per gpu
([#346](#346))
([adcd86e](adcd86e))

---
This PR was generated with [Release
Please](https://github.com/googleapis/release-please). See
[documentation](https://github.com/googleapis/release-please#release-please).

<!-- This is an auto-generated comment: release notes by coderabbit.ai
-->

## Summary by CodeRabbit

- Bug Fixes
  - Corrected efficiency report job step handling.
  - Prevented invalid GPU task settings by unsetting values ≤ 0.
- Documentation
  - Added guidance on configuring tasks per GPU.
- Chores
  - Updated packaging tooling and version to 1.6.1.

<!-- end of auto-generated comment: release notes by coderabbit.ai -->
@cademirch
Copy link
Contributor Author

I think that the efficiency report should be written to the .snakemake directory. Its been cluttering up my workdirs when testing stuff

This is why we have --slurm-efficiency-report-path - we have discussed this in the past. My argument is that users will want to have this in their working path rather than in a hidden one. For testing, I admit, the needs are different.

Ah I was not aware of this. Great.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Efficiency report confusion

2 participants