Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Snakemake fails to parse sbatch output to get slurm_jobid #146

Open
kennethabarr opened this issue Sep 10, 2024 · 3 comments
Open

Snakemake fails to parse sbatch output to get slurm_jobid #146

kennethabarr opened this issue Sep 10, 2024 · 3 comments

Comments

@kennethabarr
Copy link

On my system slurm reports submission info even at the lowest level of verbosity. I.e.

> sbatch script.sh
sbatch: Verify job submission ...
sbatch: Using a shared partition ...
sbatch: Partition: X
sbatch: QOS-Flag: X
sbatch: Account: X
sbatch: Verification: ***PASSED***
Submitted batch job 23401073

This leads to problems parsing the jobid. This cannot be resolved with --quiet because then the job id is not returned either. The lines beginning with "sbatch: " are printed to stderr, while the jobid is printed to stdout, but in the code these are mixed on init.py:207

out = subprocess.check_output(
                call, shell=True, text=True, stderr=subprocess.STDOUT
            )

It is not an elegant solution, but I have modified my init.py by adding a line to subset the variable out to only the last line. This restores the proper behavior and my pipelines run normally.

out = out.splitlines()[len(out.splitlines())-1].strip()

The versions I am using are as follows:
snakemake version 8.18.2
slurm executor version 0.10.0
slurm version 20.11.8

@cmeesters
Copy link
Member

This is the problem when providing a piece of software which should ease the execution on HPC-clusters: The never-ending creativity of admins who tinker with the defaults.

Usually, nothing more than Submitted batch job 23401073 (and perhaps the cluster name in a multi-cluster setup) is reported. I confess: To anticipate your cluster's level of verbosity was beyond my level of imagination.

Your patch is not production ready. Do you want to submit a pull request? Or should I go forward? Please note: I am terribly busy these days. It is not likely that I get ready before October.

Also: In-job submission of snakemake jobs is currently not well-supported. I am working on improvements.

@nigiord
Copy link
Contributor

nigiord commented Sep 13, 2024

@kennethabarr Could you share the output of sbatch --parsable […] on your system? Since v0.10.0 the plugin uses this option for robustly parsing the sbatch output. Since you have an error, I guess you still have the extra verbosity even with this option? Or are you using an older version of the plugin executor?

I think admins should not change the behavior of this kind of things (--parsable is exactly designed to be used in scripts). I’m not sure it is a good idea to try to comply with custom tampering of the sbatch command. For instance, the solution proposed would break if some admin add the extra verbosity after (and not before) the normal output.

I think you can create a workaround without changing the executor code, by defining an alias for the sbatch command in your bashrc that only keeps the stdout. Would this work for you?

function sbatch() {
    sbatch $@ 2> /dev/null
}

On the other hand, do we really need to merge stdout and stderr in the executor?

@kennethabarr
Copy link
Author

I still receive the same "sbatch: ..." lines, but they are part of stderr not stdout.

My understanding is that the reason to merge stdout and stderr is to propagate errors encountered in the sbatch script.

I think some version of your solution is a far better hack than what I have done and I will implement it right away. I don't think I'm experienced enough with python to implement a production-ready fix in the code that can handle our creative cluster admins.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants