fix: efficiency report jobsteps #338

cademirch · 2025-08-07T16:53:13Z

Fixes #337.

This PR makes sure for each job in sacct output, only the jobstep is reported and that each record has rulename and memrequested. Also refactors efficiency report slightly to allow unit testing of parsing sacct output.

Summary by CodeRabbit

Refactor
- Split the efficiency report into separate data-fetching and parsing stages for more reliable SLURM data handling.
- Improved handling of job steps with inheritance of job metadata (rule names, requested memory) and robust CPU/memory efficiency calculations with low-efficiency warnings.
Tests
- Added a test verifying SLURM output parsing, metadata inheritance for job steps, and correct memory/efficiency reporting.

…ightly to allow unit testing of parsing sacct output.

coderabbitai · 2025-08-07T16:53:22Z

Walkthrough

The efficiency-report logic was split into data fetching and parsing: get_sacct_data fetches raw sacct lines; parse_sacct_data parses, collapses job steps, inherits metadata, computes CPU/memory efficiencies; create_efficiency_report now orchestrates these steps. A unit test for parsing/collapsing was added.

Changes

Cohort / File(s)	Change Summary
Efficiency report refactor `snakemake_executor_plugin_slurm/efficiency_report.py`	Added `get_sacct_data(run_uuid, logger)` to run `sacct --parsable2` and return raw lines; added `parse_sacct_data(lines, e_threshold, run_uuid, logger)` to parse lines into a DataFrame, rename Comment→RuleName when present, drop/ignore batch/extern rows, derive MainJobID, split main jobs vs steps, inherit RuleName and RequestedMem_MB from main jobs into steps, compute CPU Efficiency (%) and Memory Usage (%), handle infinities, and log low-efficiency warnings; `create_efficiency_report` now orchestrates these functions and preserves CSV output behavior.
Unit test addition `tests/tests.py`	Added `test_parse_sacct_data()` which supplies sample sacct lines, calls `parse_sacct_data(...)`, prints resulting CSV, and asserts that job steps are collapsed into two rows, `RuleName` and `RequestedMem_MB` are inherited, and `MaxRSS_MB` on steps is > 0.

Sequence Diagram(s)

sequenceDiagram
    participant Caller
    participant create_efficiency_report
    participant get_sacct_data
    participant parse_sacct_data
    Note right of get_sacct_data #eef2ff: runs `sacct --parsable2`, returns lines or None
    Note right of parse_sacct_data #f7f7e6: parses lines, collapses steps, computes efficiencies

    Caller->>create_efficiency_report: call(e_threshold, run_uuid, e_report_path, logger)
    create_efficiency_report->>get_sacct_data: get_sacct_data(run_uuid, logger)
    get_sacct_data-->>create_efficiency_report: lines or None
    alt lines present
        create_efficiency_report->>parse_sacct_data: parse_sacct_data(lines, e_threshold, run_uuid, logger)
        parse_sacct_data-->>create_efficiency_report: DataFrame (steps collapsed, efficiencies)
        create_efficiency_report-->>Caller: DataFrame saved as CSV (report)
    else no data
        create_efficiency_report-->>Caller: None
    end

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~25 minutes

Assessment against linked issues

Objective	Addressed	Explanation
Collapse SLURM job records so job steps inherit metadata (rule name, memory request, etc.) (#337)	✅
Filter out batch and extern jobs, keeping main jobs and job steps for reporting (#337)	✅
Compute CPU and memory efficiency from parsed fields and avoid infinite/zero artifacts (#337)	✅
Add tests to verify collapsing and inheritance behavior (#337)	✅

Assessment against linked issues: Out-of-scope changes

No out-of-scope changes found.

Possibly related PRs

feat: measuring compute efficiency per job #221 — earlier refactor of the same efficiency_report implementation; strongly related to the get/parse split in this PR.

Suggested reviewers

johanneskoester
dlaehnemann

Poem

"I thumped through sacct lines, a rabbit on the run,
I stitched main jobs to steps until the parsing was done.
Rules hop to their steps, memory numbers align,
CSV carrots gathered—efficiency tastes fine.
🐇✨"

Tip

🔌 Remote MCP (Model Context Protocol) integration is now available!

Pro plan users can now connect to remote MCP servers from the Integrations page. Connect with popular remote MCPs such as Notion and Linear to add more context to your reviews and chats.

📜 Recent review details

Configuration used: .coderabbit.yaml
Review profile: CHILL
Plan: Pro

💡 Knowledge Base configuration:

MCP integration is disabled by default for public repositories
Jira integration is disabled by default for public repositories
Linear integration is disabled by default for public repositories

You can enable these sources in your CodeRabbit configuration.

📥 Commits

Reviewing files that changed from the base of the PR and between c2582bf and 37c8a73.

📒 Files selected for processing (1)

tests/tests.py (2 hunks)

🚧 Files skipped from review as they are similar to previous changes (1)

tests/tests.py

⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (1)

GitHub Check: testing

✨ Finishing Touches

📝 Generate Docstrings

🧪 Generate unit tests

Create PR with unit tests
Post copyable unit tests in a comment
Commit unit tests in branch fix/efficiency_report

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

🪧 Tips

Chat

There are 3 ways to chat with CodeRabbit:

Review comments: Directly reply to a review comment made by CodeRabbit. Example:
- I pushed a fix in commit <commit_id>, please review it.
- Open a follow-up GitHub issue for this discussion.
Files and specific lines of code (under the "Files changed" tab): Tag @coderabbitai in a new review comment at the desired location with your query.
PR comments: Tag @coderabbitai in a new PR comment to ask questions about the PR branch. For the best results, please provide a very specific query, as very limited context is provided in this mode. Examples:
- @coderabbitai gather interesting stats about this repository and render them as a table. Additionally, render a pie chart showing the language distribution in the codebase.
- @coderabbitai read the files in the src/scheduler package and generate a class diagram using mermaid and a README in the markdown format.

Support

Need help? Create a ticket on our support page for assistance with any issues or questions.

CodeRabbit Commands (Invoked using PR/Issue comments)

Type @coderabbitai help to get the list of available commands.

Other keywords and placeholders

Add @coderabbitai ignore anywhere in the PR description to prevent this PR from being reviewed.
Add @coderabbitai summary to generate the high-level summary at a specific location in the PR description.
Add @coderabbitai anywhere in the PR title to generate the title automatically.

Status, Documentation and Community

Visit our Status Page to check the current availability of CodeRabbit.
Visit our Documentation for detailed information on how to use CodeRabbit.
Join our Discord Community to get help, request features, and share feedback.
Follow us on X/Twitter for updates and announcements.

coderabbitai

Actionable comments posted: 1

🔭 Outside diff range comments (1)

snakemake_executor_plugin_slurm/efficiency_report.py (1)
7-7: Remove temporary debugging code from production

The comment indicates this os import is temporary. Based on previous learnings, debugging code that logs directory contents should be removed from production implementation.

Apply this diff to remove the debugging code:
-import os  # only temporarily needed for printf debugging
 import numpy as np
     # write out the efficiency report at normal verbosity in any case
     logger.info(f"Efficiency report for workflow {run_uuid} saved to {logfile}.")
-    # state directory contents for debugging purposes
-    logger.debug(f"Current directory contents in '{p.cwd()}': {os.listdir(p.cwd())}")
Also applies to: 216-216

🧹 Nitpick comments (1)

snakemake_executor_plugin_slurm/efficiency_report.py (1)
194-195: Consider handling empty results separately from errors

The condition lines is None or not lines treats both error cases (None) and empty results (empty list) the same way. An empty list might be a valid result from sacct (e.g., if jobs haven't started yet or have been purged). Consider distinguishing between these cases for better error reporting.
-    if lines is None or not lines:
-        return None
+    if lines is None:
+        return None  # Error occurred
+    
+    if not lines:
+        logger.info(f"No job data available for workflow {run_uuid} yet.")
+        return None  # Valid but empty result

📜 Review details

Configuration used: .coderabbit.yaml
Review profile: CHILL
Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between d8d605c and b290aaa.

📒 Files selected for processing (2)

snakemake_executor_plugin_slurm/efficiency_report.py (4 hunks)
tests/tests.py (2 hunks)

🧰 Additional context used

🧠 Learnings (9)

📓 Common learnings

Learnt from: cmeesters
PR: snakemake/snakemake-executor-plugin-slurm#0
File: :0-0
Timestamp: 2025-01-13T09:54:22.950Z
Learning: PR #173 (adding gres resource specification) depends on PR #28 in snakemake-executor-plugin-slurm-jobstep repository, as changes were required in the cpu-settings function of the jobstep-Executor module.

Learnt from: cmeesters
PR: snakemake/snakemake-executor-plugin-slurm#221
File: snakemake_executor_plugin_slurm/efficiency_report.py:34-49
Timestamp: 2025-05-26T12:22:07.005Z
Learning: In the parse_reqmem function in snakemake_executor_plugin_slurm/efficiency_report.py, the conditional check for per_unit == "/node" is necessary for proper conversion of memory values in multi-node jobs, not redundant as it might initially appear.

Learnt from: johanneskoester
PR: snakemake/snakemake-executor-plugin-slurm#173
File: docs/further.md:96-96
Timestamp: 2025-03-10T15:20:51.829Z
Learning: PR #173 in snakemake-executor-plugin-slurm implements GPU job support by adding resources: `gres` for generic resource specifications (e.g., 'gpu:1'), `gpu`/`gpus` for specifying GPU counts, and `gpu_model`/`gpu_manufacturer` for specifying GPU types, allowing users to request GPU resources directly rather than having to use slurm_extra.

Learnt from: cmeesters
PR: snakemake/snakemake-executor-plugin-slurm#221
File: snakemake_executor_plugin_slurm/__init__.py:0-0
Timestamp: 2025-06-03T17:10:51.292Z
Learning: In the snakemake-executor-plugin-slurm repository, the create_efficiency_report method in snakemake_executor_plugin_slurm/__init__.py contains temporary testing/debugging code that logs directory contents using os.listdir() and directory paths. This code was inserted for testing purposes only and should be removed from the production implementation.

Learnt from: cmeesters
PR: snakemake/snakemake-executor-plugin-slurm#178
File: snakemake_executor_plugin_slurm/__init__.py:0-0
Timestamp: 2024-12-11T14:17:08.749Z
Learning: In the `snakemake-executor-plugin-slurm` project, when handling exceptions in `snakemake_executor_plugin_slurm/__init__.py`, prefer concise error messages and avoid unnecessary verbosity or exception chaining when it's not necessary.

Learnt from: cmeesters
PR: snakemake/snakemake-executor-plugin-slurm#193
File: .github/workflows/post_to_mastodon.yml:26-26
Timestamp: 2025-01-20T09:13:26.443Z
Learning: In the snakemake-executor-plugin-slurm repository, release PRs follow the naming pattern "chore(main): release X.Y.Z" where X.Y.Z is the version number.

Learnt from: cmeesters
PR: snakemake/snakemake-executor-plugin-slurm#221
File: tests/tests.py:62-70
Timestamp: 2025-06-24T12:41:31.837Z
Learning: In the snakemake-executor-plugin-slurm efficiency report feature, the executor is responsible for creating the efficiency_report_path directory when generating efficiency reports. Tests should not manually check for directory existence but should rely on the executor to properly create necessary directories as part of its normal operation.

Learnt from: cmeesters
PR: snakemake/snakemake-executor-plugin-slurm#187
File: .github/workflows/post_to_mastodon.yml:0-0
Timestamp: 2025-01-17T17:27:32.446Z
Learning: In the mastodon publishing workflow for snakemake-executor-plugin-slurm, the PR_TITLE environment variable is required by the post_to_mastodon.sh script and should be preserved.

📚 Learning: 2025-06-24T12:41:31.837Z

Learnt from: cmeesters
PR: snakemake/snakemake-executor-plugin-slurm#221
File: tests/tests.py:62-70
Timestamp: 2025-06-24T12:41:31.837Z
Learning: In the snakemake-executor-plugin-slurm efficiency report feature, the executor is responsible for creating the efficiency_report_path directory when generating efficiency reports. Tests should not manually check for directory existence but should rely on the executor to properly create necessary directories as part of its normal operation.

Applied to files:

tests/tests.py
snakemake_executor_plugin_slurm/efficiency_report.py

📚 Learning: 2024-12-11T14:17:08.749Z

Learnt from: cmeesters
PR: snakemake/snakemake-executor-plugin-slurm#178
File: snakemake_executor_plugin_slurm/__init__.py:0-0
Timestamp: 2024-12-11T14:17:08.749Z
Learning: In the `snakemake-executor-plugin-slurm` project, when handling exceptions in `snakemake_executor_plugin_slurm/__init__.py`, prefer concise error messages and avoid unnecessary verbosity or exception chaining when it's not necessary.

Applied to files:

tests/tests.py
snakemake_executor_plugin_slurm/efficiency_report.py

📚 Learning: 2025-02-18T14:53:09.504Z

Learnt from: cmeesters
PR: snakemake/snakemake-executor-plugin-slurm#173
File: snakemake_executor_plugin_slurm/__init__.py:29-31
Timestamp: 2025-02-18T14:53:09.504Z
Learning: The snakemake-executor-plugin-slurm tests require system dependencies (gcc, python3-dev) to build the datrie package, which is a dependency of snakemake.

Applied to files:

tests/tests.py

📚 Learning: 2025-06-03T17:10:51.292Z

Learnt from: cmeesters
PR: snakemake/snakemake-executor-plugin-slurm#221
File: snakemake_executor_plugin_slurm/__init__.py:0-0
Timestamp: 2025-06-03T17:10:51.292Z
Learning: In the snakemake-executor-plugin-slurm repository, the create_efficiency_report method in snakemake_executor_plugin_slurm/__init__.py contains temporary testing/debugging code that logs directory contents using os.listdir() and directory paths. This code was inserted for testing purposes only and should be removed from the production implementation.

Applied to files:

tests/tests.py
snakemake_executor_plugin_slurm/efficiency_report.py

📚 Learning: 2025-03-31T11:52:05.373Z

Learnt from: cmeesters
PR: snakemake/snakemake-executor-plugin-slurm#249
File: snakemake_executor_plugin_slurm/submit_string.py:21-29
Timestamp: 2025-03-31T11:52:05.373Z
Learning: In the snakemake-executor-plugin-slurm project, account and partition parameters are validated in the submit function before reaching the get_submit_command function. By design, these parameters are either empty strings or already properly formatted with SLURM parameter tags, making additional validation in get_submit_command unnecessary.

Applied to files:

tests/tests.py
snakemake_executor_plugin_slurm/efficiency_report.py

📚 Learning: 2025-05-26T12:22:07.005Z

Learnt from: cmeesters
PR: snakemake/snakemake-executor-plugin-slurm#221
File: snakemake_executor_plugin_slurm/efficiency_report.py:34-49
Timestamp: 2025-05-26T12:22:07.005Z
Learning: In the parse_reqmem function in snakemake_executor_plugin_slurm/efficiency_report.py, the conditional check for per_unit == "/node" is necessary for proper conversion of memory values in multi-node jobs, not redundant as it might initially appear.

Applied to files:

tests/tests.py
snakemake_executor_plugin_slurm/efficiency_report.py

📚 Learning: 2025-01-13T09:54:22.950Z

Learnt from: cmeesters
PR: snakemake/snakemake-executor-plugin-slurm#0
File: :0-0
Timestamp: 2025-01-13T09:54:22.950Z
Learning: PR #173 (adding gres resource specification) depends on PR #28 in snakemake-executor-plugin-slurm-jobstep repository, as changes were required in the cpu-settings function of the jobstep-Executor module.

Applied to files:

tests/tests.py

📚 Learning: 2025-03-10T15:20:51.829Z

Learnt from: johanneskoester
PR: snakemake/snakemake-executor-plugin-slurm#173
File: docs/further.md:96-96
Timestamp: 2025-03-10T15:20:51.829Z
Learning: PR #173 in snakemake-executor-plugin-slurm implements GPU job support by adding resources: `gres` for generic resource specifications (e.g., 'gpu:1'), `gpu`/`gpus` for specifying GPU counts, and `gpu_model`/`gpu_manufacturer` for specifying GPU types, allowing users to request GPU resources directly rather than having to use slurm_extra.

Applied to files:

tests/tests.py
snakemake_executor_plugin_slurm/efficiency_report.py

🧬 Code Graph Analysis (1)

tests/tests.py (1)

snakemake_executor_plugin_slurm/efficiency_report.py (1)

parse_sacct_data (78-184)

⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (1)

GitHub Check: testing

🔇 Additional comments (2)

snakemake_executor_plugin_slurm/efficiency_report.py (1)

127-149: Well-implemented job step handling logic

The refactored logic correctly addresses the issue by:

Filtering out batch/extern entries that represent SLURM overhead

Separating main jobs from job steps

Inheriting metadata (RuleName, RequestedMem_MB) from main jobs to their corresponding job steps

Using only job steps in the final dataset since they contain actual resource usage

This approach resolves the confusion with multiple entries per job and ensures consistent field population.

tests/tests.py (1)

47-53: Good test coverage for job step handling

The assertions effectively verify the core functionality:

Job step collapsing (2 rows from 6 input lines)

RuleName inheritance from main jobs to job steps

RequestedMem_MB inheritance

Actual memory usage captured from job steps

This provides good validation of the refactored parsing logic.

tests/tests.py

cmeesters · 2025-08-08T07:25:56Z

@cademirch thank for this PR!

Considering the other discussion about whether we continue relying exclusively on SLURM with accounting databases, I feel we need a logical switch for people without a DB, in which case the efficiency option should not be present.

Let me add this and update the docs accordingly. With a little luck, I can test and amend some time next week (and also have a closer look on your other PR).

cademirch · 2025-08-21T04:54:18Z

Minor note here, I think that the efficiency report should be written to the .snakemake directory. Its been cluttering up my workdirs when testing stuff

cmeesters · 2025-08-21T12:26:15Z

I think that the efficiency report should be written to the .snakemake directory. Its been cluttering up my workdirs when testing stuff

This is why we have --slurm-efficiency-report-path - we have discussed this in the past. My argument is that users will want to have this in their working path rather than in a hidden one. For testing, I admit, the needs are different.

🤖 I have created a release *beep* *boop* --- ## [1.6.1](v1.6.0...v1.6.1) (2025-08-21) ### Bug Fixes * efficiency report jobsteps ([#338](#338)) ([a4cbe36](a4cbe36)) * gpu tasks are unset if <= 0 ([#347](#347)) ([564e0f7](564e0f7)) * updated poetry ([#343](#343)) ([58d471d](58d471d)) ### Documentation * added notes about ntasks per gpu ([#346](#346)) ([adcd86e](adcd86e)) --- This PR was generated with [Release Please](https://github.com/googleapis/release-please). See [documentation](https://github.com/googleapis/release-please#release-please).  ## Summary by CodeRabbit - Bug Fixes - Corrected efficiency report job step handling. - Prevented invalid GPU task settings by unsetting values ≤ 0. - Documentation - Added guidance on configuring tasks per GPU. - Chores - Updated packaging tooling and version to 1.6.1.

cademirch · 2025-08-21T19:56:28Z

I think that the efficiency report should be written to the .snakemake directory. Its been cluttering up my workdirs when testing stuff

This is why we have --slurm-efficiency-report-path - we have discussed this in the past. My argument is that users will want to have this in their working path rather than in a hidden one. For testing, I admit, the needs are different.

Ah I was not aware of this. Great.

fix: only use jobsteps, but make sure they have rulename. refactor sl…

b290aaa

…ightly to allow unit testing of parsing sacct output.

cademirch requested review from cmeesters and slaube-jgu August 7, 2025 16:53

coderabbitai bot reviewed Aug 7, 2025

View reviewed changes

tests/tests.py Show resolved Hide resolved

cmeesters removed the request for review from slaube-jgu August 8, 2025 08:04

fix: formatting

c2582bf

fix: linting

37c8a73

cmeesters approved these changes Aug 21, 2025

View reviewed changes

cmeesters merged commit a4cbe36 into main Aug 21, 2025
6 checks passed

cmeesters deleted the fix/efficiency_report branch August 21, 2025 13:03

snakemake-bot mentioned this pull request Aug 21, 2025

chore(main): release 1.6.1 #344

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

fix: efficiency report jobsteps #338

fix: efficiency report jobsteps #338

Uh oh!

cademirch commented Aug 7, 2025 •

edited by coderabbitai bot

Loading

Uh oh!

coderabbitai bot commented Aug 7, 2025 •

edited

Loading

Chat

Support

CodeRabbit Commands (Invoked using PR/Issue comments)

Other keywords and placeholders

Status, Documentation and Community

Uh oh!

coderabbitai bot left a comment

Uh oh!

Uh oh!

cmeesters commented Aug 8, 2025

Uh oh!

cademirch commented Aug 21, 2025

Uh oh!

cmeesters commented Aug 21, 2025

Uh oh!

Uh oh!

cademirch commented Aug 21, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

fix: efficiency report jobsteps #338

fix: efficiency report jobsteps #338

Uh oh!

Conversation

cademirch commented Aug 7, 2025 • edited by coderabbitai bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary by CodeRabbit

Uh oh!

coderabbitai bot commented Aug 7, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Walkthrough

Changes

Sequence Diagram(s)

Estimated code review effort

Assessment against linked issues

Assessment against linked issues: Out-of-scope changes

Possibly related PRs

Suggested reviewers

Poem

Chat

Support

CodeRabbit Commands (Invoked using PR/Issue comments)

Other keywords and placeholders

Status, Documentation and Community

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

cmeesters commented Aug 8, 2025

Uh oh!

cademirch commented Aug 21, 2025

Uh oh!

cmeesters commented Aug 21, 2025

Uh oh!

Uh oh!

cademirch commented Aug 21, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

cademirch commented Aug 7, 2025 •

edited by coderabbitai bot

Loading

coderabbitai bot commented Aug 7, 2025 •

edited

Loading