Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Confusion of local and remote-local storage prefixes in SLURM job contexts #28

Open
cmeesters opened this issue Dec 10, 2024 · 0 comments

Comments

@cmeesters
Copy link
Member

Hi,

given this mini workflow:

SAMPLES = ["a", "b", "c"]

rule all:
    input: "data.tar.gz"

rule foo:
    output: temp("{sample}.txt")
    shell:
      "touch {output}"

rule bundle:
    input: expand("{sample}.txt", sample=SAMPLES)
    output: "data.tar.gz"
    shell:
       "tir czf {output} {input}"

which should bundle the output with tar and is intentionally broken and this config:

cat ~/.config/snakemake/config.yaml 
executor: slurm
latency-wait: 5
#default-resources:
#    slurm_partition: 'smallcpu' if 'threads' < 200 else 'parallel'
default-storage-provider: fs
local-storage-prefix: /dev/shm/\$USER
remote-job-local-storage-prefix: /localscratch/\$SLURM_JOB_ID
shared-fs-usage:
  - persistence
  - sources
  - source-cache

the bundle rule fails with:

Error in rule bundle:
    message: SLURM-job '665356' failed, SLURM status is: 'FAILED'. For further error details see the cluster/cloud log and the log files of the involved rule(s).
    jobid: 1
    input: a.txt (retrieve from storage), b.txt (retrieve from storage), c.txt (retrieve from storage)
    output: data.tar.gz (send to storage)
    log: /gpfs/fs1/home/meesters/snakemake-irods/.snakemake/slurm_logs/rule_bundle/665356.log (check log file(s) for error details)
    shell:
        tir czf /dev/shm/\meesters/fs/data.tar.gz /dev/shm/\meesters/fs/a.txt /dev/shm/\meesters/fs/b.txt /dev/shm/\meesters/fs/c.txt
        (one of the commands exited with non-zero exit code; note that snakemake uses bash strict mode!)
    external_jobid: 665356

Apparently, the SLURM job considers local-storage-prefix to be the one it should be using, rather than remote-job-local-storage-prefix.

Also, the env variables need to be escaped to get that far. Otherwise, the foo jobs abort with touch /dev/shm/meesters/fs/c.txt, where fs is introduced in the path and that is simply not created, hence the abort.

I wonder, does the executor need an additional flag to consider the remote path?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant