-
Notifications
You must be signed in to change notification settings - Fork 34
feat: custom log file behaviour #159
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Merged
Merged
Changes from 35 commits
Commits
Show all changes
39 commits
Select commit
Hold shift + click to select a range
f50794c
feat: added flags for intended logfile behaviour - no code, yet
cmeesters b64b3a9
fix: syntax errors and formatting
cmeesters a7d9df7
fix: working interface
cmeesters 5133877
Merge branch 'main' into feat/logfeatures
cmeesters 9c68936
fix: log prefix instead of logdir. - not working
cmeesters 099292e
Merge branch 'main' into feat/logfeatures
cmeesters db4172a
feat: implementing all features as described
cmeesters 0d6a200
fix: removed unnecessary pathlib import
cmeesters 34adb4e
fix: lininting issues
cmeesters 730433f
fix: using atexit to decouple the function from __del__, moved all co…
cmeesters 490a7c3
fix: deleted unused import
cmeesters a9ff35c
fix: deleted unused import
cmeesters e567349
fix: linting issues
cmeesters 4770c8b
fix: linting issues II
cmeesters 9cc019b
feat: not rellying on '/home/$USER' any more, this is dangerous. Inst…
cmeesters bfd9cd6
fix: removed trailing whitespace
cmeesters d7e0e93
fix: using os.path.join for path concatenation, like it should be
cmeesters 39cf201
Update snakemake_executor_plugin_slurm/__init__.py
cmeesters 568080a
Merge branch 'feat/logfeatures' of github.com:snakemake/snakemake-exe…
cmeesters e727989
fix: formatting and linting
cmeesters 51ae157
fix: moved cleanup code before __post_init__
cmeesters e43f108
fix: removed one more trailing whitespace
cmeesters 88b6705
fix: those who want to keep all logs should be pleased
cmeesters 009e216
docs: documenting the new feature
cmeesters 8260f6b
fix: removed table of command line flags special to the executor - it…
cmeesters f750600
feat: same code - based on on the pathlib library
cmeesters b661dd8
Update snakemake_executor_plugin_slurm/utils.py
cmeesters aaad25d
fix: no multiline warnings
cmeesters 6f74d18
fix: removed outcommented code
cmeesters 63b4f59
fix: reordered such that functions follow 'post_init'
cmeesters 829a889
fix: converted help strings to single line strings
cmeesters c9c0eed
fix: reverted to previous default of logging in workdir
cmeesters 8a18089
fix: back to default SLURM logdir NOT being in HOME, all code now bas…
cmeesters 1ef9e98
fix: removed (once more) the additional flags section
cmeesters 9764842
feat: documentation on the new features
cmeesters 59cf40e
Update snakemake_executor_plugin_slurm/utils.py
cmeesters c9c7b8e
fix: recursively deleting log subdirs
cmeesters 739f878
Update snakemake_executor_plugin_slurm/__init__.py
johanneskoester d6f5567
Update snakemake_executor_plugin_slurm/__init__.py
johanneskoester File filter
Filter by extension
Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
There are no files selected for viewing
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -3,9 +3,11 @@ | |
__email__ = "[email protected]" | ||
__license__ = "MIT" | ||
|
||
import atexit | ||
import csv | ||
from io import StringIO | ||
import os | ||
from pathlib import Path | ||
import re | ||
import shlex | ||
import subprocess | ||
|
@@ -26,30 +28,59 @@ | |
from snakemake_interface_common.exceptions import WorkflowError | ||
from snakemake_executor_plugin_slurm_jobstep import get_cpus_per_task | ||
|
||
from .utils import delete_slurm_environment | ||
from .utils import delete_slurm_environment, delete_empty_dirs | ||
|
||
|
||
@dataclass | ||
class ExecutorSettings(ExecutorSettingsBase): | ||
logdir: Optional[str] = field( | ||
default=None, | ||
metadata={ | ||
"help": "Per default the SLURM log directory is relative to " | ||
"the working directory." | ||
"This flag allows to set an alternative directory.", | ||
"env_var": False, | ||
"required": False, | ||
}, | ||
) | ||
keep_successful_logs: bool = field( | ||
default=False, | ||
metadata={ | ||
"help": "Per default SLURM log files will be deleted upon sucessful " | ||
"completion of a job. Whenever a SLURM job fails, its log " | ||
"file will be preserved. " | ||
"This flag allows to keep all SLURM log files, even those " | ||
"of successful jobs.", | ||
"env_var": False, | ||
"required": False, | ||
}, | ||
) | ||
delete_logfiles_older_than: Optional[int] = field( | ||
default=10, | ||
metadata={ | ||
"help": "Per default SLURM log files in the SLURM log directory " | ||
"of a workflow will be deleted after 10 days. For this, " | ||
"best leave the default log directory unaltered. " | ||
"Setting this flag allows to change this behaviour. " | ||
"If set to <=0, no old files will be deleted. ", | ||
}, | ||
) | ||
init_seconds_before_status_checks: Optional[int] = field( | ||
default=40, | ||
metadata={ | ||
"help": """ | ||
Defines the time in seconds before the first status | ||
check is performed after job submission. | ||
""", | ||
"help": "Defines the time in seconds before the first status " | ||
"check is performed after job submission.", | ||
"env_var": False, | ||
"required": False, | ||
}, | ||
) | ||
requeue: bool = field( | ||
default=False, | ||
metadata={ | ||
"help": """ | ||
Allow requeuing preempted of failed jobs, | ||
if no cluster default. Results in `sbatch ... --requeue ...` | ||
This flag has no effect, if not set. | ||
""", | ||
"help": "Allow requeuing preempted of failed jobs, " | ||
"if no cluster default. Results in " | ||
"`sbatch ... --requeue ...` " | ||
"This flag has no effect, if not set.", | ||
"env_var": False, | ||
"required": False, | ||
}, | ||
|
@@ -91,6 +122,32 @@ def __post_init__(self): | |
self._fallback_account_arg = None | ||
self._fallback_partition = None | ||
self._preemption_warning = False # no preemption warning has been issued | ||
self.slurm_logdir = None | ||
atexit.register(self.clean_old_logs) | ||
|
||
def clean_old_logs(self) -> None: | ||
"""Delete files older than specified age from the SLURM log directory.""" | ||
# shorthands: | ||
age_cutoff = self.workflow.executor_settings.delete_logfiles_older_than | ||
keep_all = self.workflow.executor_settings.keep_successful_logs | ||
if age_cutoff <= 0 or keep_all: | ||
return | ||
cutoff_secs = age_cutoff * 86400 | ||
current_time = time.time() | ||
self.logger.info(f"Cleaning up log files older than {age_cutoff} day(s)") | ||
for path in self.slurm_logdir.rglob("*.log"): | ||
if path.is_file(): | ||
try: | ||
file_age = current_time - path.stat().st_mtime | ||
if file_age > cutoff_secs: | ||
path.unlink() | ||
except (OSError, FileNotFoundError) as e: | ||
self.logger.warning(f"Could not delete logfile {path}: {e}") | ||
# we need a 2nd iteration to remove putatively empty directories | ||
try: | ||
delete_empty_dirs(self.slurm_logdir) | ||
except (OSError, FileNotFoundError) as e: | ||
self.logger.warning(f"Could not delete empty directory {path}: {e}") | ||
|
||
def warn_on_jobcontext(self, done=None): | ||
if not done: | ||
|
@@ -123,18 +180,22 @@ def run_job(self, job: JobExecutorInterface): | |
except AttributeError: | ||
wildcard_str = "" | ||
|
||
slurm_logfile = os.path.abspath( | ||
f".snakemake/slurm_logs/{group_or_rule}/{wildcard_str}/%j.log" | ||
self.slurm_logdir = ( | ||
Path(self.workflow.executor_settings.logdir) | ||
if self.workflow.executor_settings.logdir | ||
else Path(".snakemake/slurm_logs").resolve() | ||
) | ||
logdir = os.path.dirname(slurm_logfile) | ||
|
||
self.slurm_logdir.mkdir(parents=True, exist_ok=True) | ||
slurm_logfile = self.slurm_logdir / group_or_rule / wildcard_str / "%j.log" | ||
slurm_logfile.parent.mkdir(parents=True, exist_ok=True) | ||
# this behavior has been fixed in slurm 23.02, but there might be plenty of | ||
# older versions around, hence we should rather be conservative here. | ||
assert "%j" not in logdir, ( | ||
assert "%j" not in str(self.slurm_logdir), ( | ||
"bug: jobid placeholder in parent dir of logfile. This does not work as " | ||
"we have to create that dir before submission in order to make sbatch " | ||
"happy. Otherwise we get silent fails without logfiles being created." | ||
) | ||
os.makedirs(logdir, exist_ok=True) | ||
|
||
# generic part of a submission string: | ||
# we use a run_uuid as the job-name, to allow `--name`-based | ||
|
@@ -247,7 +308,9 @@ def run_job(self, job: JobExecutorInterface): | |
slurm_jobid = out.strip().split(";")[0] | ||
if not slurm_jobid: | ||
raise WorkflowError("Failed to retrieve SLURM job ID from sbatch output.") | ||
slurm_logfile = slurm_logfile.replace("%j", slurm_jobid) | ||
slurm_logfile = slurm_logfile.with_name( | ||
slurm_logfile.name.replace("%j", slurm_jobid) | ||
) | ||
self.logger.info( | ||
f"Job {job.jobid} has been submitted with SLURM jobid {slurm_jobid} " | ||
f"(log: {slurm_logfile})." | ||
|
@@ -380,6 +443,19 @@ async def check_active_jobs( | |
self.report_job_success(j) | ||
any_finished = True | ||
active_jobs_seen_by_sacct.remove(j.external_jobid) | ||
if not self.workflow.executor_settings.keep_successful_logs: | ||
self.logger.debug( | ||
"removing log for successful job " | ||
f"with SLURM ID '{j.external_jobid}'" | ||
) | ||
try: | ||
if j.aux["slurm_logfile"].exists(): | ||
j.aux["slurm_logfile"].unlink() | ||
except (OSError, FileNotFoundError) as e: | ||
self.logger.warning( | ||
"Could not remove log file" | ||
f" {j.aux['slurm_logfile']._str}: {e}" | ||
johanneskoester marked this conversation as resolved.
Show resolved
Hide resolved
|
||
) | ||
elif status == "PREEMPTED" and not self._preemption_warning: | ||
self._preemption_warning = True | ||
self.logger.warning( | ||
|
@@ -404,7 +480,9 @@ async def check_active_jobs( | |
# with a new sentence | ||
f"'{status}'. " | ||
) | ||
self.report_job_error(j, msg=msg, aux_logs=[j.aux["slurm_logfile"]]) | ||
self.report_job_error( | ||
j, msg=msg, aux_logs=[j.aux["slurm_logfile"]._str] | ||
) | ||
active_jobs_seen_by_sacct.remove(j.external_jobid) | ||
else: # still running? | ||
yield j | ||
|
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Oops, something went wrong.
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Uh oh!
There was an error while loading. Please reload this page.