Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat: custom log file behaviour #159

Open
wants to merge 26 commits into
base: main
Choose a base branch
from
Open

feat: custom log file behaviour #159

wants to merge 26 commits into from

Conversation

cmeesters
Copy link
Member

@cmeesters cmeesters commented Oct 20, 2024

This PR should allow to

  • set a custom log directory for SLURM jobs
  • enable auto-deletion of older SLURM log files after 10 days
  • allow changing the that behaviour or disabling auto deletion all together
  • auto delete log files of successful SLURM jobs (and ignore this default behaviour)

The idea behind this PR is to

  • limit the number of SLURM log files to keep, as a workflow can easily produce thousands of log files. In case of a successful job, the information is redundant anyway, hence the proposed auto-deletion of successful jobs.
  • users may select a custom directory for SLURM log files - this way different workflows can point to one prefix. Together with the auto-deletion of older log files, this futher limits the number of present log files.

It addresses issues #94 and #123.

Summary by CodeRabbit

  • New Features

    • Added a method to clean old log files based on age, improving log maintenance.
    • Introduced new command line flags for enhanced log management, including options for log directory and retention settings.
  • Improvements

    • Enhanced documentation for clearer guidance on configuring SLURM resources and managing logs.
    • Updated job status handling to respect new log retention settings, improving configurability.

Copy link
Contributor

coderabbitai bot commented Oct 31, 2024

Walkthrough

The changes enhance the Executor class in the SLURM executor plugin for Snakemake by introducing new log management features and configuration options. A method named clean_old_logs is added to delete outdated log files, and the ExecutorSettings dataclass is updated with new fields for configuring the log directory and retention settings. The run_job method is modified to utilize the new log configuration, and the check_active_jobs method now respects the setting for retaining successful logs.

Changes

File Change Summary
snakemake_executor_plugin_slurm/__init__.py - Added method clean_old_logs(self) in Executor.
- Added fields logdir, keep_successful_logs, and delete_logfiles_older_than in ExecutorSettings.
- Updated run_job to use logdir for SLURM log file path.
- Modified check_active_jobs to consider keep_successful_logs.
snakemake_executor_plugin_slurm/utils.py - Imported logging, created a logger instance.
- No changes to delete_slurm_environment.
docs/further.md - Added command line flags: --slurm-logdir, --slurm-keep-successful-logs, --slurm-delete-logfiles-older-than.
- Clarified existing flags: --slurm-init-seconds-before-status-checks, --slurm-requeue.
- Expanded section on "Advanced Resource Specifications" and updated "Retries" section.
- Introduced section on "Dynamic Parameterization".

Possibly related PRs

  • feat: added requeue option to client #136: This PR adds a new field requeue to the ExecutorSettings class, which is related to the management of job logs and settings in the SLURM executor, similar to the new fields added in the main PR.
  • feat: in job stability #137: This PR modifies the warn_on_jobcontext method in the ExecutorSettings class, which is relevant as it also involves changes to the SLURM executor's functionality, particularly in managing job contexts.
  • docs: requeue #153: This PR documents new command line flags that directly correspond to the new fields added in the main PR, enhancing user control over SLURM log management.
  • fix: sbatch stderr parsing #161: This PR modifies the run_job method in the Executor class, which is directly related to the changes made in the main PR regarding job submission and log handling.
  • fix: sbatch stderr parsing #2 #165: Similar to fix: sbatch stderr parsing #161, this PR also enhances the run_job method, focusing on error handling during SLURM job submissions, which aligns with the improvements made in the main PR.

Suggested reviewers

  • johanneskoester
  • fgvieira
  • dlaehnemann

🐰 In the log dir, where bunnies hop,
New methods sprout, they never stop!
With jobs to run and logs to clear,
SLURM's now tidy, bring on the cheer!
Hopping through tasks, all is bright,
In the world of jobs, everything's right! 🌟


Thank you for using CodeRabbit. We offer it for free to the OSS community and would appreciate your support in helping us grow. If you find it useful, would you consider giving us a shout-out on your favorite social media?

❤️ Share
🪧 Tips

Chat

There are 3 ways to chat with CodeRabbit:

  • Review comments: Directly reply to a review comment made by CodeRabbit. Example:
    • I pushed a fix in commit <commit_id>, please review it.
    • Generate unit testing code for this file.
    • Open a follow-up GitHub issue for this discussion.
  • Files and specific lines of code (under the "Files changed" tab): Tag @coderabbitai in a new review comment at the desired location with your query. Examples:
    • @coderabbitai generate unit testing code for this file.
    • @coderabbitai modularize this function.
  • PR comments: Tag @coderabbitai in a new PR comment to ask questions about the PR branch. For the best results, please provide a very specific query, as very limited context is provided in this mode. Examples:
    • @coderabbitai gather interesting stats about this repository and render them as a table. Additionally, render a pie chart showing the language distribution in the codebase.
    • @coderabbitai read src/utils.ts and generate unit testing code.
    • @coderabbitai read the files in the src/scheduler package and generate a class diagram using mermaid and a README in the markdown format.
    • @coderabbitai help me debug CodeRabbit configuration file.

Note: Be mindful of the bot's finite context window. It's strongly recommended to break down tasks such as reading entire modules into smaller chunks. For a focused discussion, use review comments to chat about specific files and their changes, instead of using the PR comments.

CodeRabbit Commands (Invoked using PR comments)

  • @coderabbitai pause to pause the reviews on a PR.
  • @coderabbitai resume to resume the paused reviews.
  • @coderabbitai review to trigger an incremental review. This is useful when automatic reviews are disabled for the repository.
  • @coderabbitai full review to do a full review from scratch and review all the files again.
  • @coderabbitai summary to regenerate the summary of the PR.
  • @coderabbitai resolve resolve all the CodeRabbit review comments.
  • @coderabbitai configuration to show the current CodeRabbit configuration for the repository.
  • @coderabbitai help to get help.

Other keywords and placeholders

  • Add @coderabbitai ignore anywhere in the PR description to prevent this PR from being reviewed.
  • Add @coderabbitai summary to generate the high-level summary at a specific location in the PR description.
  • Add @coderabbitai anywhere in the PR title to generate the title automatically.

Documentation and Community

  • Visit our Documentation for detailed information on how to use CodeRabbit.
  • Join our Discord Community to get help, request features, and share feedback.
  • Follow us on X/Twitter for updates and announcements.

@cmeesters cmeesters marked this pull request as ready for review November 18, 2024 15:30
Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 7

🧹 Outside diff range and nitpick comments (8)
snakemake_executor_plugin_slurm/utils.py (2)

19-23: Enhance function documentation and add type hints.

The function documentation should include:

  • Parameter descriptions with types
  • Return value (if any)
  • Examples of usage
  • Explanation of the age cutoff unit (days)
-def clean_old_logs(logdir, age_cutoff):
+def clean_old_logs(logdir: str, age_cutoff: float) -> None:
     """
-    Function to delete files older than 'age_cutoff'
-    in the SLURM 'logdir'
+    Delete files in the SLURM log directory that are older than the specified age cutoff.
+
+    Args:
+        logdir: Path to the SLURM log directory
+        age_cutoff: Maximum age of files in days before deletion (0 or negative to keep all files)
+
+    Example:
+        >>> clean_old_logs("/path/to/logs", 10.0)  # Delete files older than 10 days
     """

24-24: Replace magic number with a named constant.

The conversion factor from days to seconds should be defined as a constant at module level for better maintainability.

+# Constants
+SECONDS_PER_DAY = 24 * 60 * 60  # 86400 seconds in a day
+
 def clean_old_logs(logdir: str, age_cutoff: float) -> None:
     """
     ...
     """
-    cutoff_secs = age_cutoff * 86400
+    cutoff_secs = age_cutoff * SECONDS_PER_DAY
snakemake_executor_plugin_slurm/__init__.py (6)

51-54: Correct typographical error in help text

There's a misspelling in the help text for keep_successful_logs. The word "sucessful" should be "successful". Additionally, consider replacing "Per default" with "By default" for improved clarity.

Apply this diff to fix the typo and improve phrasing:

-        Per default SLURM log files will be deleted upon sucessful completion
+        By default, SLURM log files will be deleted upon successful completion

150-152: Remove commented-out code

The method delete_old_logs is commented out and appears to be obsolete due to the introduction of clean_old_logs. Removing unused code helps maintain clarity.

Apply this diff to remove the commented code:

- # def delete_old_logs(self):
- #    self.workflow.executor_settings.delete_logfiles_older_than

432-436: Handle exceptions when removing log files

When deleting log files using os.remove, it's advisable to handle potential exceptions such as FileNotFoundError to prevent the application from crashing if the file does not exist.

Modify the code to include exception handling:

try:
    os.remove(j.aux["slurm_logfile"])
except FileNotFoundError:
    self.logger.warning(f"Log file {j.aux['slurm_logfile']} not found when attempting to remove.")
except Exception as e:
    self.logger.error(f"Error removing log file {j.aux['slurm_logfile']}: {e}")

51-54: Standardize help text formatting

The help texts for the fields in ExecutorSettings have inconsistent indentation and spacing. Standardizing the formatting improves readability and maintains consistency across the codebase.

Consider adjusting the help texts to align properly and use consistent indentation.

Also applies to: 64-65, 71-73


Line range hint 139-147: Ensure done flag in warn_on_jobcontext method is correctly managed

In the warn_on_jobcontext method, the done parameter is checked but might not be managed correctly. Since done is not stored or updated outside the method scope, the recursive call to warn_on_jobcontext(done=True) has no practical effect.

Consider refactoring the method to properly manage the done state or remove the parameter if it's unnecessary.

🧰 Tools
🪛 Ruff

139-140: Use a single if statement instead of nested if statements

Combine if statements using and

(SIM102)


Line range hint 281-283: Handle potential exceptions when parsing slurm_jobid

When parsing slurm_jobid from the sbatch output, ensure that the code handles cases where the output might not be as expected, which could lead to an IndexError.

Add a check before accessing out.strip().split(";")[0]:

slurm_jobid_parts = out.strip().split(";")
if slurm_jobid_parts:
    slurm_jobid = slurm_jobid_parts[0]
else:
    raise WorkflowError("Failed to retrieve SLURM job ID from sbatch output.")
🧰 Tools
🪛 Ruff

15-15: pathlib imported but unused

(F401)

📜 Review details

Configuration used: .coderabbit.yaml
Review profile: CHILL

📥 Commits

Reviewing files that changed from the base of the PR and between 531ebc6 and db4172a.

📒 Files selected for processing (2)
  • snakemake_executor_plugin_slurm/__init__.py (6 hunks)
  • snakemake_executor_plugin_slurm/utils.py (1 hunks)
🧰 Additional context used
🪛 Ruff
snakemake_executor_plugin_slurm/__init__.py

15-15: pathlib imported but unused

(F401)

🔇 Additional comments (4)
snakemake_executor_plugin_slurm/__init__.py (4)

61-66: Clarify behavior when delete_logfiles_older_than is set to 0

The help text mentions that setting delete_logfiles_older_than to '0' disables deletion of old log files. Since the field is defined as an Optional[int], it's more appropriate to use 0 (without quotes) to represent an integer value. Ensure that this behavior is correctly implemented and consider clarifying the help text.


181-184: Update assertion to reflect new slurm_logdir construction

With the changes to slurm_logfile, ensure that the assertion checking for %j in the directory path is still valid and correctly reflects the new path construction.


186-186: Ensure slurm_logdir is correctly created

After reconstructing slurm_logfile using os.path.join, verify that slurm_logdir is correctly obtained using os.path.dirname(slurm_logfile) and that the directory is created as intended.


61-66: Confirm type handling for delete_logfiles_older_than

Ensure that the delete_logfiles_older_than field, which is an Optional[int], correctly handles the value 0 to disable log deletion. Verify that the logic in clean_old_logs interprets 0 appropriately.

snakemake_executor_plugin_slurm/utils.py Outdated Show resolved Hide resolved
snakemake_executor_plugin_slurm/utils.py Outdated Show resolved Hide resolved
snakemake_executor_plugin_slurm/utils.py Outdated Show resolved Hide resolved
snakemake_executor_plugin_slurm/__init__.py Outdated Show resolved Hide resolved
snakemake_executor_plugin_slurm/__init__.py Outdated Show resolved Hide resolved
snakemake_executor_plugin_slurm/__init__.py Outdated Show resolved Hide resolved
snakemake_executor_plugin_slurm/__init__.py Outdated Show resolved Hide resolved
Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

🧹 Outside diff range and nitpick comments (2)
snakemake_executor_plugin_slurm/__init__.py (2)

169-171: Use consistent f-string formatting

The comment string construction can be simplified using f-strings consistently.

Apply this diff to improve readability:

-           comment_str = f"{group_or_rule}"
-       else:
-           comment_str = f"{group_or_rule}_wildcards_{wildcard_str}"
+           comment_str = group_or_rule
+       else:
+           comment_str = f"{group_or_rule}_wildcards_{wildcard_str}"

504-505: LGTM! Consider using a single f-string

The error logging improvement is good, but the string concatenation can be simplified.

Apply this diff for better readability:

-               "The job status query failed with"
-               f" command: {command}\n"
+               f"The job status query failed with command: {command}\n"
📜 Review details

Configuration used: .coderabbit.yaml
Review profile: CHILL

📥 Commits

Reviewing files that changed from the base of the PR and between db4172a and 34adb4e.

📒 Files selected for processing (1)
  • snakemake_executor_plugin_slurm/__init__.py (6 hunks)

snakemake_executor_plugin_slurm/__init__.py Outdated Show resolved Hide resolved
Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 2

🧹 Outside diff range and nitpick comments (3)
snakemake_executor_plugin_slurm/__init__.py (3)

51-54: Fix typos in help text

There are typos in the help text:

  • "sucessful" should be "successful"
  • The sentence structure could be improved
                   Per default SLURM log files will be deleted upon
-                  sucessful completion
+                  successful completion
                   of a job. Whenever a SLURM job fails, its log file will be preserved.

455-459: Add error handling for log file deletion

The log file deletion should handle potential errors gracefully, as the file might have been deleted by another process or the user.

                     if not self.workflow.executor_settings.keep_successful_logs:
                         self.logger.debug(
                             f"removing log for successful job with SLURM ID '{j.external_jobid}'"
                         )
-                        os.remove(j.aux["slurm_logfile"])
+                        try:
+                            if os.path.exists(j.aux["slurm_logfile"]):
+                                os.remove(j.aux["slurm_logfile"])
+                        except OSError as e:
+                            self.logger.warning(f"Could not remove log file {j.aux['slurm_logfile']}: {e}")

173-175: Remove commented-out code

The commented-out method delete_old_logs should be removed as it's no longer needed and can cause confusion.

-    # def delete_old_logs(self):
-    #    self.workflow.executor_settings.delete_logfiles_older_than
-
📜 Review details

Configuration used: .coderabbit.yaml
Review profile: CHILL

📥 Commits

Reviewing files that changed from the base of the PR and between 34adb4e and 730433f.

📒 Files selected for processing (2)
  • snakemake_executor_plugin_slurm/__init__.py (6 hunks)
  • snakemake_executor_plugin_slurm/utils.py (1 hunks)
🧰 Additional context used
🪛 Ruff
snakemake_executor_plugin_slurm/__init__.py

30-30: .utils.clean_old_logs imported but unused; consider removing, adding to __all__, or using a redundant alias

(F401)


131-131: Redefinition of unused clean_old_logs from line 30

(F811)

snakemake_executor_plugin_slurm/utils.py

4-4: time imported but unused

Remove unused import: time

(F401)

🔇 Additional comments (1)
snakemake_executor_plugin_slurm/__init__.py (1)

35-46: ⚠️ Potential issue

Use os.path.expanduser('~') for more robust home directory handling

Using os.environ['USER'] directly can fail if the USER environment variable is not set. Using os.path.expanduser('~') is more robust and cross-platform compatible.

     logdir: Optional[str] = field(
-        default=f"/home/{os.environ['USER']}/.snakemake/slurm_logs",
+        default=os.path.join(os.path.expanduser('~'), ".snakemake", "slurm_logs"),
         metadata={

Likely invalid or redundant comment.

snakemake_executor_plugin_slurm/__init__.py Outdated Show resolved Hide resolved
Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 4

🧹 Outside diff range and nitpick comments (1)
snakemake_executor_plugin_slurm/__init__.py (1)

177-179: Remove commented-out code

The commented-out delete_old_logs method is no longer needed as the functionality has been implemented elsewhere.

Remove these commented-out lines to improve code cleanliness.

📜 Review details

Configuration used: .coderabbit.yaml
Review profile: CHILL

📥 Commits

Reviewing files that changed from the base of the PR and between 730433f and e567349.

📒 Files selected for processing (2)
  • snakemake_executor_plugin_slurm/__init__.py (7 hunks)
  • snakemake_executor_plugin_slurm/utils.py (1 hunks)
🚧 Files skipped from review as they are similar to previous changes (1)
  • snakemake_executor_plugin_slurm/utils.py

snakemake_executor_plugin_slurm/__init__.py Outdated Show resolved Hide resolved
snakemake_executor_plugin_slurm/__init__.py Outdated Show resolved Hide resolved
snakemake_executor_plugin_slurm/__init__.py Outdated Show resolved Hide resolved
snakemake_executor_plugin_slurm/__init__.py Outdated Show resolved Hide resolved
Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

🧹 Outside diff range and nitpick comments (2)
snakemake_executor_plugin_slurm/__init__.py (2)

47-60: Fix typo in documentation string

There's a typo in the help text: "sucessful" should be "successful".

-                   Per default SLURM log files will be deleted upon sucessful
+                   Per default SLURM log files will be deleted upon successful

61-72: Validate delete_logfiles_older_than value

While the code checks for values ≤ 0, there should be validation for extremely large values that could cause performance issues. Consider adding an upper limit.

     delete_logfiles_older_than: Optional[int] = field(
         default=10,
+        validators=[field(lambda x: 0 <= x <= 365, "Value must be between 0 and 365 days")],
         metadata={
📜 Review details

Configuration used: .coderabbit.yaml
Review profile: CHILL

📥 Commits

Reviewing files that changed from the base of the PR and between e567349 and 4770c8b.

📒 Files selected for processing (1)
  • snakemake_executor_plugin_slurm/__init__.py (7 hunks)
🔇 Additional comments (3)
snakemake_executor_plugin_slurm/__init__.py (3)

199-204: 🛠️ Refactor suggestion

Use os.path.join for log file path construction

The current implementation uses string concatenation which is error-prone and not cross-platform friendly.

-        slurm_logfile = (
-            self.workflow.executor_settings.logdir
-            + os.path.sep
-            + f"{group_or_rule}/{wildcard_str}/%j.log"
-        )
+        slurm_logfile = os.path.join(
+            self.workflow.executor_settings.logdir,
+            group_or_rule,
+            wildcard_str,
+            "%j.log"
+        )

Likely invalid or redundant comment.


135-163: 🛠️ Refactor suggestion

Move clean_old_logs outside post_init

The current implementation has several issues:

  1. Function defined inside post_init reduces maintainability and testability
  2. Potential race condition in directory removal
  3. Missing type hints

Move the function to class level and improve its implementation:

class Executor(RemoteExecutor):
+    def clean_old_logs(self, logdir: str, age_cutoff: int) -> None:
+        """Delete files older than specified age from the SLURM log directory.
+        
+        Args:
+            logdir: Path to the log directory
+            age_cutoff: Number of days after which files should be deleted
+        """
+        if age_cutoff <= 0:
+            return
+        cutoff_secs = age_cutoff * 86400
+        current_time = time.time()
+        self.logger.info(f"Cleaning up log files older than {age_cutoff} day(s)")
+
+        for root, _, files in os.walk(logdir, topdown=False):
+            for fname in files:
+                file_path = os.path.join(root, fname)
+                try:
+                    file_age = current_time - os.stat(file_path).st_mtime
+                    if file_age > cutoff_secs:
+                        os.remove(file_path)
+                except (OSError, FileNotFoundError) as e:
+                    self.logger.warning(f"Could not delete file {file_path}: {e}")
+            # Remove empty directories
+            try:
+                if os.path.exists(root) and not os.listdir(root):
+                    os.rmdir(root)
+            except (OSError, FileNotFoundError) as e:
+                self.logger.warning(f"Could not remove empty directory {root}: {e}")

    def __post_init__(self):
        # ... other initialization code ...
        atexit.register(
            self.clean_old_logs,
            self.workflow.executor_settings.logdir,
            self.workflow.executor_settings.delete_logfiles_older_than,
        )

Likely invalid or redundant comment.


35-46: ⚠️ Potential issue

Make logdir path construction more robust

The current implementation directly accesses os.environ['USER'] which can raise KeyError if USER is not set. Use os.path.expanduser('~') for a more robust and cross-platform solution.

-    logdir: Optional[str] = field(
-        default=f"/home/{os.environ['USER']}/.snakemake/slurm_logs",
+    logdir: Optional[str] = field(
+        default=os.path.join(os.path.expanduser('~'), ".snakemake", "slurm_logs"),

Likely invalid or redundant comment.

snakemake_executor_plugin_slurm/__init__.py Outdated Show resolved Hide resolved
Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 0

🧹 Outside diff range and nitpick comments (1)
snakemake_executor_plugin_slurm/__init__.py (1)

51-51: Fix typo in documentation

The word "sucessful" is misspelled in the help text.

Apply this diff to fix the typo:

-                   Per default SLURM log files will be deleted upon sucessful
+                   Per default SLURM log files will be deleted upon successful
📜 Review details

Configuration used: .coderabbit.yaml
Review profile: CHILL

📥 Commits

Reviewing files that changed from the base of the PR and between 4770c8b and 39cf201.

📒 Files selected for processing (1)
  • snakemake_executor_plugin_slurm/__init__.py (7 hunks)
🔇 Additional comments (4)
snakemake_executor_plugin_slurm/__init__.py (4)

459-470: LGTM: Log cleanup implementation is robust

The implementation includes proper error handling, path existence checks, and appropriate logging.


199-204: 🛠️ Refactor suggestion

Use os.path.join for path construction

The current implementation uses string concatenation which is error-prone and not cross-platform friendly.

Apply this diff to improve path construction:

-        slurm_logfile = (
-            self.workflow.executor_settings.logdir
-            + os.path.sep
-            + f"{group_or_rule}/{wildcard_str}/%j.log"
-        )
+        slurm_logfile = os.path.join(
+            self.workflow.executor_settings.logdir,
+            group_or_rule,
+            wildcard_str,
+            "%j.log"
+        )

Likely invalid or redundant comment.


35-46: ⚠️ Potential issue

Use os.path.expanduser('~') for home directory path

The current implementation has potential issues:

  1. Using os.environ['USER'] can raise KeyError if USER is not set
  2. The help text mentions '~' but the code uses an absolute path

Apply this diff to fix the path construction:

    logdir: Optional[str] = field(
-        default=f"/home/{os.environ['USER']}/.snakemake/slurm_logs",
+        default=os.path.join(os.path.expanduser('~'), ".snakemake", "slurm_logs"),

Likely invalid or redundant comment.


135-163: 🛠️ Refactor suggestion

Improve clean_old_logs implementation

Several issues with the current implementation:

  1. Function defined inside post_init reduces maintainability and testability
  2. Missing error handling for directory removal
  3. Missing type hints
  4. Potential race condition in directory removal logic

Apply this diff to improve the implementation:

+ def clean_old_logs(self, logdir: str, age_cutoff: int) -> None:
+     """Delete files older than specified age from the SLURM log directory.
+     
+     Args:
+         logdir: Path to the log directory
+         age_cutoff: Number of days after which files should be deleted
+     """
+     if age_cutoff <= 0:
+         return
+     cutoff_secs = age_cutoff * 86400
+     current_time = time.time()
+     self.logger.info(f"Cleaning up log files older than {age_cutoff} day(s)")
+
+     for root, _, files in os.walk(logdir, topdown=False):
+         for fname in files:
+             file_path = os.path.join(root, fname)
+             try:
+                 file_age = current_time - os.stat(file_path).st_mtime
+                 if file_age > cutoff_secs:
+                     os.remove(file_path)
+             except (OSError, FileNotFoundError) as e:
+                 self.logger.warning(f"Could not delete file {file_path}: {e}")
+         # Remove empty directories
+         try:
+             if os.path.exists(root) and not os.listdir(root):
+                 os.rmdir(root)
+         except (OSError, FileNotFoundError) as e:
+             self.logger.warning(f"Could not remove empty directory {root}: {e}")

def __post_init__(self):
    # ... other initialization code ...
    atexit.register(
        self.clean_old_logs,
        self.workflow.executor_settings.logdir,
        self.workflow.executor_settings.delete_logfiles_older_than,
    )

Likely invalid or redundant comment.

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 0

🧹 Outside diff range and nitpick comments (2)
snakemake_executor_plugin_slurm/__init__.py (2)

35-72: LGTM! Clear and well-documented settings.

The new settings for log management are well-structured with appropriate defaults and helpful documentation. The path construction for logdir is safe and cross-platform compatible.

Consider fixing the typo in the help text:

-                   Per default SLURM log files will be deleted upon sucessful
+                   Per default SLURM log files will be deleted upon successful

177-179: Remove commented-out code.

The commented-out delete_old_logs method appears to be obsolete now that the new log cleanup functionality has been implemented.

Apply this diff to remove the commented-out code:

-    # def delete_old_logs(self):
-    #    self.workflow.executor_settings.delete_logfiles_older_than
-
📜 Review details

Configuration used: .coderabbit.yaml
Review profile: CHILL

📥 Commits

Reviewing files that changed from the base of the PR and between 39cf201 and 568080a.

📒 Files selected for processing (1)
  • snakemake_executor_plugin_slurm/__init__.py (7 hunks)
🔇 Additional comments (2)
snakemake_executor_plugin_slurm/__init__.py (2)

460-471: LGTM! Well-implemented log cleanup for successful jobs.

The implementation includes:

  • Proper conditional check for keep_successful_logs setting
  • Comprehensive error handling for file operations
  • Appropriate debug and warning logging

135-163: 🛠️ Refactor suggestion

Refactor log cleanup implementation for better maintainability.

While the log cleanup functionality works correctly, there are some improvements that could make it more maintainable:

  1. Move the function definition outside post_init to make it a proper class method
  2. Add error handling for directory removal operations
  3. Add type hints for better code clarity

Here's the suggested implementation:

-    def __post_init__(self):
-        # ... other init code ...
-        def clean_old_logs(logdir, age_cutoff):
-            """
-            Function to delete files older than 'age_cutoff'
-            in the SLURM 'logdir'
-            """
+    def clean_old_logs(self, logdir: str, age_cutoff: int) -> None:
+        """Delete files older than specified age from the SLURM log directory.
+        
+        Args:
+            logdir: Path to the log directory
+            age_cutoff: Number of days after which files should be deleted
+        """
+        if age_cutoff <= 0:
+            return
+        cutoff_secs = age_cutoff * 86400
+        current_time = time.time()
+        self.logger.info(f"Cleaning up log files older than {age_cutoff} day(s)")
+
+        for root, _, files in os.walk(logdir, topdown=False):
+            for fname in files:
+                file_path = os.path.join(root, fname)
+                try:
+                    file_age = current_time - os.stat(file_path).st_mtime
+                    if file_age > cutoff_secs:
+                        os.remove(file_path)
+                except (OSError, FileNotFoundError) as e:
+                    self.logger.warning(f"Could not delete file {file_path}: {e}")
+            # remove empty rule top dir, if empty
+            try:
+                if len(os.listdir(root)) == 0:
+                    os.rmdir(root)
+            except (OSError, FileNotFoundError) as e:
+                self.logger.warning(f"Could not remove empty directory {root}: {e}")

     def __post_init__(self):
+        # ... other init code ...
         atexit.register(
             self.clean_old_logs,
             self.workflow.executor_settings.logdir,
             self.workflow.executor_settings.delete_logfiles_older_than,
         )

Likely invalid or redundant comment.

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 0

🧹 Outside diff range and nitpick comments (2)
snakemake_executor_plugin_slurm/__init__.py (2)

126-156: Fix docstring inconsistencies.

The docstring has the following issues:

  • Missing return type annotation
  • Lists parameters that don't exist in the method signature (logdir, age_cutoff)

Apply this diff to fix the docstring:

-    def clean_old_logs(self) -> None:
-        """Delete files older than specified age from the SLURM log directory.
-
-        Args:
-            logdir: Path to the log directory
-            age_cutoff: Number of days after which files should be deleted
-        """
+    def clean_old_logs(self) -> None:
+        """Delete files older than the configured age from the SLURM log directory.
+
+        Uses workflow.executor_settings.logdir as the log directory and
+        workflow.executor_settings.delete_logfiles_older_than as the age cutoff.
+        """

463-475: Improve multiline string formatting.

While the error handling is good, the multiline f-strings could be formatted better for readability.

Apply this diff to improve string formatting:

-                            f"""removing log for successful job
-                                with SLURM ID '{j.external_jobid}'"""
+                            f"removing log for successful job with SLURM ID '{j.external_jobid}'"
-                                f"""Could not remove log file
-                                {j.aux['slurm_logfile']}: {e}"""
+                                f"Could not remove log file {j.aux['slurm_logfile']}: {e}"
📜 Review details

Configuration used: .coderabbit.yaml
Review profile: CHILL

📥 Commits

Reviewing files that changed from the base of the PR and between 568080a and e43f108.

📒 Files selected for processing (1)
  • snakemake_executor_plugin_slurm/__init__.py (8 hunks)
🔇 Additional comments (3)
snakemake_executor_plugin_slurm/__init__.py (3)

35-72: LGTM! Well-structured configuration fields with clear documentation.

The new configuration fields are well-designed with:

  • Cross-platform compatible default log directory
  • Sensible defaults for log retention (10 days) and cleanup
  • Clear and informative help text for users

166-167: LGTM! Appropriate use of atexit for cleanup.

The registration of clean_old_logs with atexit ensures proper cleanup when the program exits.


202-217: LGTM! Robust log file path handling with clear error messages.

The changes improve the log file path handling:

  • Uses cross-platform path construction with os.path.join
  • Includes a helpful assertion to prevent SLURM log file creation issues

@cmeesters
Copy link
Member Author

@johanneskoester , @dlaehnemann , @fgvieira - I would be happy if at least two of you agreed on the changes in this PR. There are a number of design decisions and coding decisions to consider.

  • I want all SLURM logs of successful jobs to be auto-deleted. These are just too many files. Users can turn this behaviour off.
  • Ultimately, all SLURM logs shall be deleted. After 10 days is the default. No reason to keep old SLURM logs: The job logs are, after all, partial Snakemake logs as Snakemake is the job script. And the Snakemake logs are unaffected by this.
  • Per default, all SLURM logs will be written to ~/.snakemake/slurm_logs. This is a constant which allows the previous feature to work. Users may overwrite this behaviour.
  • The clean-up method is triggered at exit. For this, the atexit module is just. Triggering by __del__ has its own issues. For atexit.register() to work, the clean-up methods needs to proceed the __post_init__ method. Not nice, but stable.

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

🧹 Outside diff range and nitpick comments (3)
docs/further.md (3)

168-170: Enhance documentation of new logging flags with examples

While the new logging flags are well documented, consider adding:

  1. Example usage for each flag
  2. More prominent display of default values
  3. Explanation of how these flags interact with each other
 | `--slurm-logdir` | can change the default directory for SLURM logs, which is `~/.snakemake/slurm_logs/`|
 | `--slurm-keep-successful-logs` | per default all SLURM logs of successful jobs will be deleted upon job completion, these are just too many files too keep. The flag will overwrite this behaviour.|
 | `--slurm-delete-logfiles-older-than` | all log files in `--slurm-logdir` will be deleted after some time (default is after 10 days). If set to <= 0 this deletion will be skipped. The flag allows to change the default.|
+
+Example usage:
+```console
+# Change log directory and retain logs for 30 days
+$ snakemake --executor slurm --slurm-logdir /path/to/logs --slurm-delete-logfiles-older-than 30
+
+# Keep all successful logs and disable auto-deletion
+$ snakemake --executor slurm --slurm-keep-successful-logs --slurm-delete-logfiles-older-than 0
+```
+
+Note: The `--slurm-keep-successful-logs` flag takes precedence over `--slurm-delete-logfiles-older-than` for successful job logs.
🧰 Tools
🪛 LanguageTool

[grammar] ~170-~170: Did you mean “changing”? Or maybe you should add a pronoun? In active voice, ‘allow’ + ‘to’ takes an object, usually a pronoun.
Context: ...letion will be skipped. The flag allows to change the default.| | `--slurm-init-seconds-b...

(ALLOW_TO)


170-170: Fix grammar in flag description

The phrase "allows to change" is grammatically incorrect.

-| `--slurm-delete-logfiles-older-than` | all log files in `--slurm-logdir` will be deleted after some time (default is after 10 days). If set to <= 0 this deletion will be skipped. The flag allows to change the default.|
+| `--slurm-delete-logfiles-older-than` | all log files in `--slurm-logdir` will be deleted after some time (default is after 10 days). If set to <= 0 this deletion will be skipped. This flag allows you to change the default.|
🧰 Tools
🪛 LanguageTool

[grammar] ~170-~170: Did you mean “changing”? Or maybe you should add a pronoun? In active voice, ‘allow’ + ‘to’ takes an object, usually a pronoun.
Context: ...letion will be skipped. The flag allows to change the default.| | `--slurm-init-seconds-b...

(ALLOW_TO)


167-173: Consider adding a dedicated "Log Management" section

The new logging features would be better highlighted in their own section, rather than being mixed with other command-line flags. This would make it easier for users to find and understand the log management capabilities.

Consider reorganizing the documentation to include a new section:

### Log Management

The SLURM executor plugin provides several options for managing job log files. By default, logs are stored in `~/.snakemake/slurm_logs/` and successful job logs are automatically deleted upon completion. Logs older than 10 days are also automatically cleaned up.

#### Configuration Options

| Flag        | Description  |
|-------------|----------|
| `--slurm-logdir` | Changes the default directory for SLURM logs |
| `--slurm-keep-successful-logs` | Prevents automatic deletion of successful job logs |
| `--slurm-delete-logfiles-older-than` | Controls the auto-deletion period for log files (default: 10 days) |

[Rest of the content...]
🧰 Tools
🪛 LanguageTool

[grammar] ~170-~170: Did you mean “changing”? Or maybe you should add a pronoun? In active voice, ‘allow’ + ‘to’ takes an object, usually a pronoun.
Context: ...letion will be skipped. The flag allows to change the default.| | `--slurm-init-seconds-b...

(ALLOW_TO)

🪛 Markdownlint

172-172: null
Link fragments should be valid

(MD051, link-fragments)

📜 Review details

Configuration used: .coderabbit.yaml
Review profile: CHILL

📥 Commits

Reviewing files that changed from the base of the PR and between 88b6705 and 009e216.

📒 Files selected for processing (1)
  • docs/further.md (1 hunks)
🧰 Additional context used
🪛 LanguageTool
docs/further.md

[grammar] ~170-~170: Did you mean “changing”? Or maybe you should add a pronoun? In active voice, ‘allow’ + ‘to’ takes an object, usually a pronoun.
Context: ...letion will be skipped. The flag allows to change the default.| | `--slurm-init-seconds-b...

(ALLOW_TO)

🪛 Markdownlint
docs/further.md

172-172: null
Link fragments should be valid

(MD051, link-fragments)

docs/further.md Show resolved Hide resolved
Comment on lines 166 to +172
| Flag | Meaning |
|-------------|----------|
| `--slurm_init_seconds_before_status_checks`| modify time before initial job status check; the default of 40 seconds avoids load on querying slurm databases, but shorter wait times are for example useful during workflow development |
| `--slurm_requeue` | allows jobs to be resubmitted automatically if they fail or are preempted. See the [section "retries" for details](#retries)|
| `--slurm-logdir` | can change the default directory for SLURM logs, which is `~/.snakemake/slurm_logs/`|
| `--slurm-keep-successful-logs` | per default all SLURM logs of successful jobs will be deleted upon job completion, these are just too many files too keep. The flag will overwrite this behaviour.|
| `--slurm-delete-logfiles-older-than` | all log files in `--slurm-logdir` will be deleted after some time (default is after 10 days). If set to <= 0 this deletion will be skipped. The flag allows to change the default.|
| `--slurm-init-seconds-before-status-checks`| modify time before initial job status check; the default of 40 seconds avoids load on querying slurm databases, but shorter wait times are for example useful during workflow development |
| `--slurm-requeue` | allows jobs to be resubmitted automatically if they fail or are preempted. See the [section "retries" for details](#retries)|
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This table should not be put here at all. The Snakemake plugin catalog generates a table with the CLI args automatically: https://snakemake.github.io/snakemake-plugin-catalog/plugins/executor/slurm.html#usage
Hence it is sufficient to put the documentation into the settings dataclass.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A, good one. Will be corrected in an update of the PR. Certainly helps to deconvolute the docs.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For the, the table is still here in the diff.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

weird: 8260f6b

snakemake_executor_plugin_slurm/__init__.py Outdated Show resolved Hide resolved
Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 0

🧹 Outside diff range and nitpick comments (2)
docs/further.md (2)

153-155: Improve clarity and grammar in log-related flag descriptions

The descriptions could be clearer and more grammatically correct:

-| `--slurm-logdir` | can change the default directory for SLURM logs, which is `~/.snakemake/slurm_logs/`|
-| `--slurm-keep-successful-logs` | per default all SLURM logs of successful jobs will be deleted upon job completion, these are just too many files too keep. The flag will overwrite this behaviour.|
-| `--slurm-delete-logfiles-older-than` | all log files in `--slurm-logdir` will be deleted after some time (default is after 10 days). If set to <= 0 this deletion will be skipped. The flag allows to change the default.|
+| `--slurm-logdir` | specifies a custom directory for SLURM logs (default: `~/.snakemake/slurm_logs/`)|
+| `--slurm-keep-successful-logs` | prevents automatic deletion of successful job logs upon completion (by default, these logs are deleted to save space)|
+| `--slurm-delete-logfiles-older-than` | specifies the number of days after which log files are deleted (default: 10 days, set to 0 or negative to disable auto-deletion)|
🧰 Tools
🪛 LanguageTool

[grammar] ~155-~155: Did you mean “changing”? Or maybe you should add a pronoun? In active voice, ‘allow’ + ‘to’ takes an object, usually a pronoun.
Context: ...letion will be skipped. The flag allows to change the default.| | `--slurm-init-seconds-b...

(ALLOW_TO)


153-158: Remove redundant CLI flags table

As mentioned in previous comments, this table should be removed as it's automatically generated by the Snakemake plugin catalog. The documentation for these settings should be moved to the settings dataclass instead.

Please refer to:

  1. The auto-generated documentation at: https://snakemake.github.io/snakemake-plugin-catalog/plugins/executor/slurm.html#usage
  2. Move the documentation to the corresponding fields in the settings dataclass
🧰 Tools
🪛 Markdownlint (0.35.0)

157-157: null
Link fragments should be valid

(MD051, link-fragments)

🪛 LanguageTool

[grammar] ~155-~155: Did you mean “changing”? Or maybe you should add a pronoun? In active voice, ‘allow’ + ‘to’ takes an object, usually a pronoun.
Context: ...letion will be skipped. The flag allows to change the default.| | `--slurm-init-seconds-b...

(ALLOW_TO)

📜 Review details

Configuration used: .coderabbit.yaml
Review profile: CHILL

📥 Commits

Reviewing files that changed from the base of the PR and between 009e216 and f750600.

📒 Files selected for processing (2)
  • docs/further.md (2 hunks)
  • snakemake_executor_plugin_slurm/__init__.py (8 hunks)
🚧 Files skipped from review as they are similar to previous changes (1)
  • snakemake_executor_plugin_slurm/init.py
🧰 Additional context used
🪛 Markdownlint (0.35.0)
docs/further.md

157-157: null
Link fragments should be valid

(MD051, link-fragments)

🪛 LanguageTool
docs/further.md

[grammar] ~155-~155: Did you mean “changing”? Or maybe you should add a pronoun? In active voice, ‘allow’ + ‘to’ takes an object, usually a pronoun.
Context: ...letion will be skipped. The flag allows to change the default.| | `--slurm-init-seconds-b...

(ALLOW_TO)

🔇 Additional comments (1)
docs/further.md (1)

157-157: Fix invalid link fragment in documentation

The link to the "retries" section is broken. The section exists with a different heading.

-| `--slurm-requeue` | allows jobs to be resubmitted automatically if they fail or are preempted. See the [section "retries" for details](#retries)|
+| `--slurm-requeue` | allows jobs to be resubmitted automatically if they fail or are preempted. See the ["Retries" section](#retries---or-trying-again-when-a-job-failed) for details|
🧰 Tools
🪛 Markdownlint (0.35.0)

157-157: null
Link fragments should be valid

(MD051, link-fragments)

@cmeesters
Copy link
Member Author

@johanneskoester care to take a look - seems as if the slurmdbd in the CI is somehow broken, though?

Comment on lines +4 to +6
import logging

logger = logging.getLogger(__name__)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
import logging
logger = logging.getLogger(__name__)

Not needed I think. Snakemake has its own logger instance that we access with self.logger.

Comment on lines +457 to +458
f"""removing log for successful job
with SLURM ID '{j.external_jobid}'"""
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Don't use a multiline string here.

Comment on lines +172 to +174
# def delete_old_logs(self):
# self.workflow.executor_settings.delete_logfiles_older_than

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Cleanup?

Comment on lines +127 to +148
def clean_old_logs(self) -> None:
"""Delete files older than specified age from the SLURM log directory."""
# shorthands:
age_cutoff = self.workflow.executor_settings.delete_logfiles_older_than
logdir_path = Path(self.workflow.executor_settings.logdir)
keep_all = self.workflow.executor_settings.keep_successful_logs
if age_cutoff <= 0 or keep_all:
return
cutoff_secs = age_cutoff * 86400
current_time = time.time()
self.logger.info(f"Cleaning up log files older than {age_cutoff} day(s)")
for path in logdir_path.rglob("*"):
try:
if path.is_file():
file_age = current_time - path.stat().st_mtime
if file_age > cutoff_secs:
path.unlink()
elif path.is_dir() and not any(path.iterdir()):
path.rmdir()
except (OSError, FileNotFoundError) as e:
self.logger.warning(f"Could not delete file {path}: {e}")

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Move below post_init for readability?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ah, this was a left-over from the initial tinkering, where I violated the name space, if not before post-init. Will check and change according to the suggestion, when I find the time - ought to be working now.

Comment on lines 89 to 93
Allow requeuing preempted of failed jobs,
if no cluster default. Results in `sbatch ... --requeue ...`
if no cluster default. Results in
`sbatch ... --requeue ...`
This flag has no effect, if not set.
""",
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

let us not use multiline strings for these, but rather simple strings that are auto-concatenated by python if just separated by whitespace.

Comment on lines +37 to +47
default=os.path.join(os.path.expanduser("~"), ".snakemake", "slurm_logs"),
metadata={
"help": """
Per default the SLURM log directory (writing output is
required by SLURM) is '~/.snakemake/slurm_logs'.
This flag allows to set an alternative directory.
""",
"env_var": False,
"required": False,
},
)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think the default should be in the workdir and not in the users home, like it was before.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This was what I posted to be discussed: By changing the default, the clean-up will be enacted every time. Alternatively, a user has to clean up hidden files, eventually. In my experience, this is what will not happen.

We are talking about SLURM logs, here. Log files, which do not contain substantial information. As Snakemake submits itself, it will mostly have the same information as printed on the terminal — particularly for failed jobs in the case of workflows using the log directive, which we ought to expect. The number of log files can be enormous. They may have a direct impact on a file quota in a work space (if enabled). We do not need to keep any in the long run. And we should think about core facilities or users who run different workflows and potentially accumulate this redundant information at different places.

Granted: Deleting logs of successful runs makes a huge difference, already. Granted: This is a deviation from the current default.

If you still want to keep the current default, I will alter this PR accordingly.

@johanneskoester
Copy link
Contributor

@johanneskoester care to take a look - seems as if the slurmdbd in the CI is somehow broken, though?

The action is fixed again. It was using an old slurm plugin that is deprecated in the new slurm release that is on the current ubuntu.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants