Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Setting memory twice when submitting with slurm executor #75

Open
kwells4 opened this issue Apr 19, 2024 · 14 comments
Open

Setting memory twice when submitting with slurm executor #75

kwells4 opened this issue Apr 19, 2024 · 14 comments

Comments

@kwells4
Copy link

kwells4 commented Apr 19, 2024

Versions

snakemake version 8.10.7
snakemake-executor-plugin-slurm version 0.4.4
snakemake-executor-plugin-slurm-jobstep version 0.2.1

The problem

I am working on getting snakemake version 8 to work on my slurm server and keep getting the following error:

srun: fatal: SLURM_MEM_PER_CPU, SLURM_MEM_PER_GPU, and SLURM_MEM_PER_NODE are mutually exclusive.

I can see that two resource arguments are being passed when looking at the rule description:

[Fri Apr 19 13:42:50 2024]
rule fastqc:
    input: /pl/active/Anschutz_BDC/analysis/wells/analysis/wells/testing_snakemake8_slurm/raw_data/control_1_R1.fastq.gz, /pl/active/Anschutz_BDC/analysis/wells/analysis/wells/testing_snakemake8_slurm/raw_data/co
ntrol_1_R2.fastq.gz
    output: /pl/active/Anschutz_BDC/analysis/wells/analysis/wells/testing_snakemake8_slurm/results/fastqc_pre_trim/fastqc_control_1_summary_untrimmed.txt
    jobid: 0
    reason: Forced execution
    wildcards: results=/pl/active/Anschutz_BDC/analysis/wells/analysis/wells/testing_snakemake8_slurm/results, sample=control_1
    resources: mem_mb=15259, mem_mib=7630, disk_mb=43311, disk_mib=41305, tmpdir=<TBD>, slurm_partition=acompile, slurm_account=amc-general, slurm_extra=--output=/pl/active/Anschutz_BDC/analysis/wells/analysis/we
lls/testing_snakemake8_slurm/results/logs/fastqc_pre_trim/fastqc_control_1_untrimmed.out --error=/pl/active/Anschutz_BDC/analysis/wells/analysis/wells/testing_snakemake8_slurm/results/logs/fastqc_pre_trim/fastqc_
control_1_untrimmed.err --qos=compile, runtime=60, mem=16GB

However, I don't know where the mem_mb is being passed.

Profile

executor: slurm

default-resources:
    slurm_partition: "acompile"
    slurm_account:   "amc-general"

set-resources:
    fastqc:
        runtime: 60 # 1 hour
        mem: "16GB"
    fastqc_summary:
        runtime: 10
        mem: "4GB"

My rule

rule fastqc:
    input:
        input_list = _get_input
    output:
        file = "{results}/fastqc_pre_trim/fastqc_{sample}_summary_untrimmed.txt"
    params:
        output_dir  = os.path.join(RESULTS2, "fastqc_pre_trim"),
        directories = _get_directories
    resources:
        slurm_extra=lambda wildcards: (
            f"--output={wildcards.results}/logs/fastqc_pre_trim/fastqc_{wildcards.sample}_untrimmed.out "
            f"--error={wildcards.results}/logs/fastqc_pre_trim/fastqc_{wildcards.sample}_untrimmed.err "
            f"--qos=compile"
        )
    singularity:
       GENERAL_CONTAINER
    shell:
        """
        mkdir -p {params.output_dir}
        fastqc {input} --outdir {params.output_dir}
        for dir in {params.directories};
        do
            name=$(basename -s .zip $dir)

            unzip -p $dir $name/summary.txt \
                >> {output}
        done
        """    

my command

snakemake \
    --snakefile Snakefile \
    --configfile config.yaml \
    --jobs 12 \
    --latency-wait 60 \
    --rerun-incomplete \
    --use-singularity \
    --workflow-profile profiles/default

Attempted fix 1: use mem_mb

I have also tried this using the mem_mb argument instead

executor: slurm

default-resources:
    slurm_partition: "acompile"
    slurm_account:   "amc-general"

set-resources:
    fastqc:
        runtime: 60 # 1 hour
        mem_mb: 1600
    fastqc_summary:
        runtime: 10
        mem_mb: 4000

I get the same error, but the double memory request is less obvious

[Fri Apr 19 13:40:42 2024]
rule fastqc:
    input: /pl/active/Anschutz_BDC/analysis/wells/analysis/wells/testing_snakemake8_slurm/raw_data/control_1_R1.fastq.gz, /pl/active/Anschutz_BDC/analysis/wells/analysis/wells/testing_snakemake8_slurm/raw_data/co
ntrol_1_R2.fastq.gz
    output: /pl/active/Anschutz_BDC/analysis/wells/analysis/wells/testing_snakemake8_slurm/results/fastqc_pre_trim/fastqc_control_1_summary_untrimmed.txt
    jobid: 0
    reason: Forced execution
    wildcards: results=/pl/active/Anschutz_BDC/analysis/wells/analysis/wells/testing_snakemake8_slurm/results, sample=control_1
    resources: mem_mb=1600, mem_mib=1526, disk_mb=43311, disk_mib=41305, tmpdir=<TBD>, slurm_partition=acompile, slurm_account=amc-general, slurm_extra=--output=/pl/active/Anschutz_BDC/analysis/wells/analysis/wel
ls/testing_snakemake8_slurm/results/logs/fastqc_pre_trim/fastqc_control_1_untrimmed.out --error=/pl/active/Anschutz_BDC/analysis/wells/analysis/wells/testing_snakemake8_slurm/results/logs/fastqc_pre_trim/fastqc_c
ontrol_1_untrimmed.err --qos=compile, runtime=60

srun: fatal: SLURM_MEM_PER_CPU, SLURM_MEM_PER_GPU, and SLURM_MEM_PER_NODE are mutually exclusive.

Attempted fix 2 remove profile and specify within rule

I have also tried this where I deleted my profile and just assigned the resources within the rule:

rule fastqc:
    input:
        input_list = _get_input
    output:
        file = "{results}/fastqc_pre_trim/fastqc_{sample}_summary_untrimmed.txt"
    resources:
        job_name="fastqc",
        mem_mb=1600,
        runtime=60,
        slurm_extra=lambda wildcards: (
            f"--output={wildcards.results}/logs/fastqc_pre_trim/fastqc_{wildcards.sample}_untrimmed.out "
            f"--error={wildcards.results}/logs/fastqc_pre_trim/fastqc_{wildcards.sample}_untrimmed.err "
            f"--qos=compile"
        )
    params:
        output_dir  = os.path.join(RESULTS2, "fastqc_pre_trim"),
        directories = _get_directories
    singularity:
       GENERAL_CONTAINER
    shell:
        """
        mkdir -p {params.output_dir}
        fastqc {input} --outdir {params.output_dir}
        for dir in {params.directories};
        do
            name=$(basename -s .zip $dir)

            unzip -p $dir $name/summary.txt \
                >> {output}
        done
        """   

Submit with:

snakemake \
    --snakefile Snakefile \
    --configfile config.yaml \
    --jobs 12 \
    --latency-wait 60 \
    --rerun-incomplete \
    --use-singularity \
    --executor slurm \
    --default-resources slurm_account=amc-general slurm_partition=acompile

But this also fails with the same srun error:

srun: fatal: SLURM_MEM_PER_CPU, SLURM_MEM_PER_GPU, and SLURM_MEM_PER_NODE are mutually exclusive.

Attempted fix 3 - submit with sbatch

I've also submitted the job per this issue but that gave the same error as above.

Conclusion

mem_mb is obviously specified somewhere but I am not sure where to look beyond the profile, rules, and snakemake command. Do you have any ideas what I may be missing? Thanks so much for your help!

@cmeesters
Copy link
Member

cmeesters commented Apr 20, 2024

However, I don't know where the mem_mb is being passed.

Your requirement should be translated to mem_mb and is sufficient. Snakemake merely lists both resources, but that should be fine as it only translates in sbatch --mem .... And indeed, within SLURM --mem and --mem-per-cpu are mutually exclusive. I will try to track this down. For this, it would be extremely helpful if you run Snakemake with --verbose and attach the output as a file. Also, please state your SLURM version (output of sinfo --version). Thank you.

@cmeesters
Copy link
Member

PS would you be interested contributing your workflows to the snakemake-workflows catalogue? see https://snakemake.github.io/snakemake-workflow-catalog/ - some of your look pretty interesting!

@kwells4
Copy link
Author

kwells4 commented Apr 22, 2024

Thanks for helping with this!

However, I don't know where the mem_mb is being passed.

Your requirement should be translated to mem_mb and is sufficient. Snakemake merely lists both resources, but that should be fine as it only translates in sbatch --mem .... And indeed, within SLURM --mem and --mem-per-cpu are mutually exclusive. I will try to track this down. For this, it would be extremely helpful if you run Snakemake with --verbose and attach the output as a file. Also, please state your SLURM version (output of sinfo --version). Thank you.

  • The slurm version is 23.02.2

  • Here's the output using --verbose from the master:

snakemake --snakefile Snakefile --configfile config.yaml --jobs 12 --latency-wait 60 --rerun-incomplete --use-singularity --workflow-profile profiles/default --verbose
Using workflow specific profile profiles/default for setting default command line arguments.
Building DAG of jobs...
shared_storage_local_copies: True
remote_exec: False
SLURM run ID: c846e871-127b-46a2-a3c5-559bfafd7f06
Using shell: /bin/bash
Provided remote nodes: 12
Job stats:
job               count
--------------  -------
all                   1
fastqc                1
fastqc_summary        1
total                 3

Resources before job selection: {'_cores': 9223372036854775807, '_nodes': 12}
Ready jobs (1)
Select jobs to execute...
Using greedy selector because only single job has to be scheduled.
Selected jobs (1)
Resources after job selection: {'_cores': 9223372036854775806, '_nodes': 11}
Execute 1 jobs...

[Mon Apr 22 08:49:54 2024]
rule fastqc:
    input: /pl/active/Anschutz_BDC/analysis/wells/analysis/wells/testing_snakemake8_slurm/raw_data/control_1_R1.fastq.gz, /pl/active/Anschutz_BDC/analysis/wells/analysis/wells/testin
g_snakemake8_slurm/raw_data/control_1_R2.fastq.gz
    output: /pl/active/Anschutz_BDC/analysis/wells/analysis/wells/testing_snakemake8_slurm/results/fastqc_pre_trim/fastqc_control_1_summary_untrimmed.txt
    jobid: 2
    reason: Missing output files: /pl/active/Anschutz_BDC/analysis/wells/analysis/wells/testing_snakemake8_slurm/results/fastqc_pre_trim/fastqc_control_1_summary_untrimmed.txt
    wildcards: results=/pl/active/Anschutz_BDC/analysis/wells/analysis/wells/testing_snakemake8_slurm/results, sample=control_1
    resources: mem_mb=15259, mem_mib=7630, disk_mb=43311, disk_mib=41305, tmpdir=<TBD>, slurm_partition=acompile, slurm_account=amc-general, slurm_extra=--output=/pl/active/Anschutz_
BDC/analysis/wells/analysis/wells/testing_snakemake8_slurm/results/logs/fastqc_pre_trim/fastqc_control_1_untrimmed.out --error=/pl/active/Anschutz_BDC/analysis/wells/analysis/wells/t
esting_snakemake8_slurm/results/logs/fastqc_pre_trim/fastqc_control_1_untrimmed.err --qos=compile, runtime=60, mem=16GB

General args: ['--force', '--target-files-omit-workdir-adjustment', '--keep-storage-local-copies', '--max-inventory-time 0', '--nocolor', '--notemp', '--no-hooks', '--nolock', '--ign
ore-incomplete', '', '--verbose ', '--rerun-triggers code software-env mtime params input', '', '', '--deployment-method apptainer', '--conda-frontend mamba', '', '', '--apptainer-pr
efix /scratch/alpine/[email protected]/apptainer_cache', '', '', '--shared-fs-usage source-cache sources storage-local-copies persistence software-deployment input-output', '',
 '--wrapper-prefix https://github.com/snakemake/snakemake-wrappers/raw/', '', '', '--configfiles /pl/active/Anschutz_BDC/analysis/wells/analysis/wells/testing_snakemake8_slurm/config
.yaml', '', '', '--latency-wait 60', '--scheduler ilp', '--local-storage-prefix .snakemake/storage', '--scheduler-solver-path /projects/[email protected]/software/anaconda/envs
/snakemake8/bin', '', '', '--set-resources base64//ZmFzdHFjOnJ1bnRpbWU9NjA= base64//ZmFzdHFjOm1lbT0xNkdC base64//ZmFzdHFjX3N1bW1hcnk6cnVudGltZT0xMA== base64//ZmFzdHFjX3N1bW1hcnk6bWVt
PTRHQg==', '', '', '--default-resources base64//bWVtX21iPW1pbihtYXgoMippbnB1dC5zaXplX21iLCAxMDAwKSwgODAwMCk= base64//ZGlza19tYj1tYXgoMippbnB1dC5zaXplX21iLCAxMDAwKQ== base64//dG1wZGly
PXN5c3RlbV90bXBkaXI= base64//c2x1cm1fcGFydGl0aW9uPWFjb21waWxl base64//c2x1cm1fYWNjb3VudD1hbWMtZ2VuZXJhbA==', '']
sbatch call: sbatch --job-name c846e871-127b-46a2-a3c5-559bfafd7f06 --output /pl/active/Anschutz_BDC/analysis/wells/analysis/wells/testing_snakemake8_slurm/.snakemake/slurm_logs/rule
_fastqc/_/pl/active/Anschutz_BDC/analysis/wells/analysis/wells/testing_snakemake8_slurm/results_control_1/%j.log --export=ALL --comment fastqc -A amc-general -p acompile -t 60 --mem 
15259 --ntasks=1 --cpus-per-task=1 --output=/pl/active/Anschutz_BDC/analysis/wells/analysis/wells/testing_snakemake8_slurm/results/logs/fastqc_pre_trim/fastqc_control_1_untrimmed.out
 --error=/pl/active/Anschutz_BDC/analysis/wells/analysis/wells/testing_snakemake8_slurm/results/logs/fastqc_pre_trim/fastqc_control_1_untrimmed.err --qos=compile -D /pl/active/Anschu
tz_BDC/analysis/wells/analysis/wells/testing_snakemake8_slurm --wrap="/projects/[email protected]/software/anaconda/envs/snakemake8/bin/python3.12 -m snakemake --snakefile /pl/
active/Anschutz_BDC/analysis/wells/analysis/wells/testing_snakemake8_slurm/Snakefile --target-jobs 'fastqc:results=/pl/active/Anschutz_BDC/analysis/wells/analysis/wells/testing_snake
make8_slurm/results,sample=control_1' --allowed-rules 'fastqc' --cores all --attempt 1 --force-use-threads  --resources 'mem_mb=15259' 'mem_mib=7630' 'disk_mb=43311' 'disk_mib=41305'
 --wait-for-files '/pl/active/Anschutz_BDC/analysis/wells/analysis/wells/testing_snakemake8_slurm/.snakemake/tmp.ch3f265w' '/pl/active/Anschutz_BDC/analysis/wells/analysis/wells/test
ing_snakemake8_slurm/raw_data/control_1_R1.fastq.gz' '/pl/active/Anschutz_BDC/analysis/wells/analysis/wells/testing_snakemake8_slurm/raw_data/control_1_R2.fastq.gz' --force --target-
files-omit-workdir-adjustment --keep-storage-local-copies --max-inventory-time 0 --nocolor --notemp --no-hooks --nolock --ignore-incomplete --verbose  --rerun-triggers code software-
env mtime params input --deployment-method apptainer --conda-frontend mamba --apptainer-prefix /scratch/alpine/[email protected]/apptainer_cache --shared-fs-usage source-cache 
sources storage-local-copies persistence software-deployment input-output --wrapper-prefix https://github.com/snakemake/snakemake-wrappers/raw/ --configfiles /pl/active/Anschutz_BDC/
analysis/wells/analysis/wells/testing_snakemake8_slurm/config.yaml --latency-wait 60 --scheduler ilp --local-storage-prefix .snakemake/storage --scheduler-solver-path /projects/kwell
[email protected]/software/anaconda/envs/snakemake8/bin --set-resources base64//ZmFzdHFjOnJ1bnRpbWU9NjA= base64//ZmFzdHFjOm1lbT0xNkdC base64//ZmFzdHFjX3N1bW1hcnk6cnVudGltZT0xMA== ba
se64//ZmFzdHFjX3N1bW1hcnk6bWVtPTRHQg== --default-resources base64//bWVtX21iPW1pbihtYXgoMippbnB1dC5zaXplX21iLCAxMDAwKSwgODAwMCk= base64//ZGlza19tYj1tYXgoMippbnB1dC5zaXplX21iLCAxMDAwKQ
== base64//dG1wZGlyPXN5c3RlbV90bXBkaXI= base64//c2x1cm1fcGFydGl0aW9uPWFjb21waWxl base64//c2x1cm1fYWNjb3VudD1hbWMtZ2VuZXJhbA== --executor slurm-jobstep --jobs 1 --mode remote"
Job 2 has been submitted with SLURM jobid 5783166 (log: /pl/active/Anschutz_BDC/analysis/wells/analysis/wells/testing_snakemake8_slurm/.snakemake/slurm_logs/rule_fastqc/_/pl/active/A
nschutz_BDC/analysis/wells/analysis/wells/testing_snakemake8_slurm/results_control_1/5783166.log).
The job status was queried with command: sacct -X --parsable2 --noheader --format=JobIdRaw,State --starttime 2024-04-20T08:00 --endtime now --name c846e871-127b-46a2-a3c5-559bfafd7f0
6
It took: 0.058480262756347656 seconds
The output is:
'5783166|FAILED
'

status_of_jobs after sacct is: {'5783166': 'FAILED'}
active_jobs_ids_with_current_sacct_status are: {'5783166'}
active_jobs_seen_by_sacct are: {'5783166'}
missing_sacct_status are: set()
[Mon Apr 22 08:50:34 2024]
Error in rule fastqc:
    message: SLURM-job '5783166' failed, SLURM status is: 'FAILED'For further error details see the cluster/cloud log and the log files of the involved rule(s).
    jobid: 2
    input: /pl/active/Anschutz_BDC/analysis/wells/analysis/wells/testing_snakemake8_slurm/raw_data/control_1_R1.fastq.gz, /pl/active/Anschutz_BDC/analysis/wells/analysis/wells/testin
g_snakemake8_slurm/raw_data/control_1_R2.fastq.gz
    output: /pl/active/Anschutz_BDC/analysis/wells/analysis/wells/testing_snakemake8_slurm/results/fastqc_pre_trim/fastqc_control_1_summary_untrimmed.txt
    log: /pl/active/Anschutz_BDC/analysis/wells/analysis/wells/testing_snakemake8_slurm/.snakemake/slurm_logs/rule_fastqc/_/pl/active/Anschutz_BDC/analysis/wells/analysis/wells/testi
ng_snakemake8_slurm/results_control_1/5783166.log (check log file(s) for error details)
    shell:
        
        mkdir -p /scratch/alpine/[email protected]/analysis/wells/analysis/wells/testing_snakemake8_slurm/results/fastqc_pre_trim
        fastqc /pl/active/Anschutz_BDC/analysis/wells/analysis/wells/testing_snakemake8_slurm/raw_data/control_1_R1.fastq.gz /pl/active/Anschutz_BDC/analysis/wells/analysis/wells/tes
ting_snakemake8_slurm/raw_data/control_1_R2.fastq.gz --outdir /scratch/alpine/[email protected]/analysis/wells/analysis/wells/testing_snakemake8_slurm/results/fastqc_pre_trim
        for dir in /scratch/alpine/[email protected]/analysis/wells/analysis/wells/testing_snakemake8_slurm/results/fastqc_pre_trim/control_1_R1_fastqc.zip /scratch/alpine/kwel
[email protected]/analysis/wells/analysis/wells/testing_snakemake8_slurm/results/fastqc_pre_trim/control_1_R2_fastqc.zip;
        do
            name=$(basename -s .zip $dir)

            unzip -p $dir $name/summary.txt                 >> /pl/active/Anschutz_BDC/analysis/wells/analysis/wells/testing_snakemake8_slurm/results/fastqc_pre_trim/fastqc_control_1
_summary_untrimmed.txt
        done
        
        (one of the commands exited with non-zero exit code; note that snakemake uses bash strict mode!)
    external_jobid: 5783166

Shutting down, this might take some time.
Exiting because a job execution failed. Look above for error message
Complete log: .snakemake/log/2024-04-22T084953.851472.snakemake.log
unlocking
removing lock
removing lock
removed all locks
Full Traceback (most recent call last):
  File "/projects/[email protected]/software/anaconda/envs/snakemake8/lib/python3.12/site-packages/snakemake/cli.py", line 2068, in args_to_api
    dag_api.execute_workflow(
  File "/projects/[email protected]/software/anaconda/envs/snakemake8/lib/python3.12/site-packages/snakemake/api.py", line 589, in execute_workflow
    workflow.execute(
  File "/projects/[email protected]/software/anaconda/envs/snakemake8/lib/python3.12/site-packages/snakemake/workflow.py", line 1285, in execute
    raise WorkflowError("At least one job did not complete successfully.")
snakemake_interface_common.exceptions.WorkflowError: At least one job did not complete successfully.

WorkflowError:
At least one job did not complete successfully.
raw_data/control_1_R1.fastq.gz

And the output from the job

Building DAG of jobs...
shared_storage_local_copies: True
remote_exec: True
Using shell: /bin/bash
Provided remote nodes: 1
Provided resources: mem_mb=15259, mem_mib=7630, disk_mb=43311, disk_mib=41305
Resources before job selection: {'mem_mb': 15259, 'mem_mib': 7630, 'disk_mb': 43311, 'disk_mib': 41305, '_cores': 9223372036854775807, '_nodes': 1}
Ready jobs (1)
Select jobs to execute...
Using greedy selector because only single job has to be scheduled.
Selected jobs (1)
Resources after job selection: {'mem_mb': 15259, 'mem_mib': 0, 'disk_mb': 43311, 'disk_mib': 0, '_cores': 9223372036854775806, '_nodes': 0}
Execute 1 jobs...

[Mon Apr 22 08:50:04 2024]
rule fastqc:
    input: /pl/active/Anschutz_BDC/analysis/wells/analysis/wells/testing_snakemake8_slurm/raw_data/control_1_R1.fastq.gz, /pl/active/Anschutz_BDC/analysis/wells/analysis/wells/testin
g_snakemake8_slurm/raw_data/control_1_R2.fastq.gz
    output: /pl/active/Anschutz_BDC/analysis/wells/analysis/wells/testing_snakemake8_slurm/results/fastqc_pre_trim/fastqc_control_1_summary_untrimmed.txt
    jobid: 0
    reason: Forced execution
    wildcards: results=/pl/active/Anschutz_BDC/analysis/wells/analysis/wells/testing_snakemake8_slurm/results, sample=control_1
    resources: mem_mb=15259, mem_mib=7630, disk_mb=43311, disk_mib=41305, tmpdir=<TBD>, slurm_partition=acompile, slurm_account=amc-general, slurm_extra=--output=/pl/active/Anschutz_
BDC/analysis/wells/analysis/wells/testing_snakemake8_slurm/results/logs/fastqc_pre_trim/fastqc_control_1_untrimmed.out --error=/pl/active/Anschutz_BDC/analysis/wells/analysis/wells/t
esting_snakemake8_slurm/results/logs/fastqc_pre_trim/fastqc_control_1_untrimmed.err --qos=compile, runtime=60, mem=16GB

General args: ['--force', '--target-files-omit-workdir-adjustment', '--keep-storage-local-copies', '--max-inventory-time 0', '--nocolor', '--notemp', '--no-hooks', '--nolock', '--ign
ore-incomplete', '', '--verbose ', '--rerun-triggers code mtime input params software-env', '', '', '--deployment-method apptainer', '--conda-frontend mamba', '', '', '--apptainer-pr
efix /scratch/alpine/[email protected]/apptainer_cache', '', '', '--shared-fs-usage sources source-cache software-deployment input-output persistence storage-local-copies', '',
 '--wrapper-prefix https://github.com/snakemake/snakemake-wrappers/raw/', '', '', '--configfiles /pl/active/Anschutz_BDC/analysis/wells/analysis/wells/testing_snakemake8_slurm/config
.yaml', '', '', '--latency-wait 60', '--scheduler ilp', '--local-storage-prefix .snakemake/storage', '--scheduler-solver-path /projects/[email protected]/software/anaconda/envs
/snakemake8/bin', '', '', '--set-resources base64//ZmFzdHFjOnJ1bnRpbWU9NjA= base64//ZmFzdHFjOm1lbT0xNkdC base64//ZmFzdHFjX3N1bW1hcnk6cnVudGltZT0xMA== base64//ZmFzdHFjX3N1bW1hcnk6bWVt
PTRHQg==', '', '', '--default-resources base64//bWVtX21iPW1pbihtYXgoMippbnB1dC5zaXplX21iLCAxMDAwKSwgODAwMCk= base64//ZGlza19tYj1tYXgoMippbnB1dC5zaXplX21iLCAxMDAwKQ== base64//dG1wZGly
PXN5c3RlbV90bXBkaXI= base64//c2x1cm1fcGFydGl0aW9uPWFjb21waWxl base64//c2x1cm1fYWNjb3VudD1hbWMtZ2VuZXJhbA==']
This job is a group job: False
The call for this job is: srun -n1 --cpu-bind=q --cpus-per-task 1 /projects/[email protected]/software/anaconda/envs/snakemake8/bin/python3.12 -m snakemake --snakefile /pl/acti
ve/Anschutz_BDC/analysis/wells/analysis/wells/testing_snakemake8_slurm/Snakefile --target-jobs 'fastqc:results=/pl/active/Anschutz_BDC/analysis/wells/analysis/wells/testing_snakemake
8_slurm/results,sample=control_1' --allowed-rules 'fastqc' --cores all --attempt 1 --force-use-threads  --resources 'mem_mb=15259' 'mem_mib=7630' 'disk_mb=43311' 'disk_mib=41305' --f
orce --target-files-omit-workdir-adjustment --keep-storage-local-copies --max-inventory-time 0 --nocolor --notemp --no-hooks --nolock --ignore-incomplete --verbose  --rerun-triggers 
code mtime input params software-env --deployment-method apptainer --conda-frontend mamba --apptainer-prefix /scratch/alpine/[email protected]/apptainer_cache --shared-fs-usage
 sources source-cache software-deployment input-output persistence storage-local-copies --wrapper-prefix https://github.com/snakemake/snakemake-wrappers/raw/ --configfiles /pl/active
/Anschutz_BDC/analysis/wells/analysis/wells/testing_snakemake8_slurm/config.yaml --latency-wait 60 --scheduler ilp --local-storage-prefix .snakemake/storage --scheduler-solver-path /
projects/[email protected]/software/anaconda/envs/snakemake8/bin --set-resources base64//ZmFzdHFjOnJ1bnRpbWU9NjA= base64//ZmFzdHFjOm1lbT0xNkdC base64//ZmFzdHFjX3N1bW1hcnk6cnVud
GltZT0xMA== base64//ZmFzdHFjX3N1bW1hcnk6bWVtPTRHQg== --default-resources base64//bWVtX21iPW1pbihtYXgoMippbnB1dC5zaXplX21iLCAxMDAwKSwgODAwMCk= base64//ZGlza19tYj1tYXgoMippbnB1dC5zaXpl
X21iLCAxMDAwKQ== base64//dG1wZGlyPXN5c3RlbV90bXBkaXI= base64//c2x1cm1fcGFydGl0aW9uPWFjb21waWxl base64//c2x1cm1fYWNjb3VudD1hbWMtZ2VuZXJhbA== --mode remote
Job is running on host: c3cpu-a2-u32-1.rc.int.colorado.edu
srun: fatal: SLURM_MEM_PER_CPU, SLURM_MEM_PER_GPU, and SLURM_MEM_PER_NODE are mutually exclusive.
[Mon Apr 22 08:50:04 2024]
Error in rule fastqc:
    jobid: 0
    input: /pl/active/Anschutz_BDC/analysis/wells/analysis/wells/testing_snakemake8_slurm/raw_data/control_1_R1.fastq.gz, /pl/active/Anschutz_BDC/analysis/wells/analysis/wells/testin
g_snakemake8_slurm/raw_data/control_1_R2.fastq.gz
    output: /pl/active/Anschutz_BDC/analysis/wells/analysis/wells/testing_snakemake8_slurm/results/fastqc_pre_trim/fastqc_control_1_summary_untrimmed.txt
    shell:
        
        mkdir -p /scratch/alpine/[email protected]/analysis/wells/analysis/wells/testing_snakemake8_slurm/results/fastqc_pre_trim
        fastqc /pl/active/Anschutz_BDC/analysis/wells/analysis/wells/testing_snakemake8_slurm/raw_data/control_1_R1.fastq.gz /pl/active/Anschutz_BDC/analysis/wells/analysis/wells/tes
ting_snakemake8_slurm/raw_data/control_1_R2.fastq.gz --outdir /scratch/alpine/[email protected]/analysis/wells/analysis/wells/testing_snakemake8_slurm/results/fastqc_pre_trim
        for dir in /scratch/alpine/[email protected]/analysis/wells/analysis/wells/testing_snakemake8_slurm/results/fastqc_pre_trim/control_1_R1_fastqc.zip /scratch/alpine/kwel
[email protected]/analysis/wells/analysis/wells/testing_snakemake8_slurm/results/fastqc_pre_trim/control_1_R2_fastqc.zip;
        do
            name=$(basename -s .zip $dir)

            unzip -p $dir $name/summary.txt                 >> /pl/active/Anschutz_BDC/analysis/wells/analysis/wells/testing_snakemake8_slurm/results/fastqc_pre_trim/fastqc_control_1
_summary_untrimmed.txt
        done
        
        (one of the commands exited with non-zero exit code; note that snakemake uses bash strict mode!)

Shutting down, this might take some time.
Exiting because a job execution failed. Look above for error message
Storing output in storage.
Full Traceback (most recent call last):
  File "/projects/[email protected]/software/anaconda/envs/snakemake8/lib/python3.12/site-packages/snakemake/cli.py", line 2068, in args_to_api
    dag_api.execute_workflow(
  File "/projects/[email protected]/software/anaconda/envs/snakemake8/lib/python3.12/site-packages/snakemake/api.py", line 589, in execute_workflow
    workflow.execute(
  File "/projects/[email protected]/software/anaconda/envs/snakemake8/lib/python3.12/site-packages/snakemake/workflow.py", line 1285, in execute
    raise WorkflowError("At least one job did not complete successfully.")
snakemake_interface_common.exceptions.WorkflowError: At least one job did not complete successfully.

WorkflowError:
At least one job did not complete successfully.

@kwells4
Copy link
Author

kwells4 commented Apr 22, 2024

PS would you be interested contributing your workflows to the snakemake-workflows catalogue? see https://snakemake.github.io/snakemake-workflow-catalog/ - some of your look pretty interesting!

I could definitely do that! I'll add it to my todo list!

@cmeesters
Copy link
Member

cmeesters commented Apr 25, 2024

Bad news: I cannot reproduce this behaviour. edit: My SLURM version is 23.02.7.

I noticed that you are overwriting of --output by slurm_extra. It does not produce the error. Yet, we have our log file at os.path.abspath(f".snakemake/slurm_logs/{group_or_rule}/{wildcard_str}/%j.log") as gets reported by the plugin.

My Snakefile is

 rule all:
     input: "results/2.out"

rule test1:
     output: "results/2.out"
     #threads: 2
     resources:
        cpus_per_task=2,
        slurm_extra="--output='somewhere_%j.log'"
     shell: "touch results/$SLURM_CPUS_PER_TASK.out"

My profile:

default-resources:
    slurm_partition: "smallcpu"
    slurm_account: "nhr-zdvhpc" #"m2_zdvhpc"

set-resources:
    test1:
        runtime: 5
        mem_mb: 1800

Does this produce the observed error, too?

@kwells4
Copy link
Author

kwells4 commented Apr 25, 2024

That's unfortunate that you can't reproduce it.

You are completely correct, using your profile (changing the partition and account) and your Snakefile I get the same error. But the error still occurs when I remove the slurm_extra argument so it doesn't seem to be coming from overwriting --output.

@cmeesters
Copy link
Member

... I get the same error.

That is not what I wanted to read ;-)

Assuming you have this script:

#!/bin/bash

#SBATCH --mem 100
#SBATCH -A amc-general 
#SBATCH -p acompile
#SBATCH -t 5

srun echo "Hello world"

and you run sbatch <this script> . Does your SLURM output contain the error, too? I mean, we observe the call to srun in the jobstep executor to NOT include any memory setting, and weirdly you still see this error.

@kwells4
Copy link
Author

kwells4 commented Apr 25, 2024

You are good, that produced the exact same error. Seems to be an issue with my system and not snakemake (probably what you did want to hear!)

I'll reach out to our system administrators. Thank you so much for all of your help!

@cmeesters
Copy link
Member

probably what you did want to hear

Not really. It is some sort of relief, though. I know that it takes effort to update SLURM, if my colleagues are bitten by a bug — but then again, I would be surprised if you are the first to report.

Thanks for the feedback. I will keep this issue open, if you don't mind, and await further feedback. Perhaps, it turns out to be a corner case, we can mitigate.

@kwells4
Copy link
Author

kwells4 commented Apr 26, 2024

Sounds great, we are working on it and have so far figured out that this works

#!/bin/bash

#SBATCH --mem 100
#SBATCH -A amc-general 
#SBATCH -p acompile
#SBATCH -t 5

srun --mem 100 echo "Hello world"

I will let you know if we make any progress.

@cmeesters
Copy link
Member

urgh, is redundancy a new hobby of SchedMD or is there a technical reason behind it (just a rhetorical question!)? I need to check a couple (read: two, for I do not have more and ask colleagues to do the same) of SLURM versions when I contribute the duplication into the code. I am not sure whether or where there might be side effects.

Also, as “my” most current version of SLURM is slightly more up to date than yours, I have to presume, that this is a quirk of your cluster.

@kwells4
Copy link
Author

kwells4 commented Apr 26, 2024

This is likely a quirk of my cluster. We will definitely keep working on our side to see if there are good fixes.

Again, thanks so much for your help!

@kwells4
Copy link
Author

kwells4 commented Apr 26, 2024

I might have found the problem... Our cluster is currently going through some growing pains so the best way to get an interactive job is by staring an interactive vscode session. When I submit the snakemakejobs from within the interactivevscode` session I get the error, but I don't when submitting from a normal interactive node.

So the slurm integration seems to work well as long as I'm not running through vscode.

@cmeesters
Copy link
Member

cmeesters commented Apr 29, 2024

Ah, the issue is that you submit whilst working within job context. I'm afraid, that's not what we designed the plugin for. It should not be an issue either, at least that issue of yours should not arise.

Now, we can certainly detect this and program a fat warning. I wonder, however, whether falling back on the actual SLURM executor instead of the jobstep executor is possible as a reaction. Either way, I will keep this issue open until I have an answer to this question.

cmeesters added a commit that referenced this issue Jun 25, 2024
Already, two issues (#75 and #22) seem to result from running Snakemake
in a SLURM job context and using this executor plugin. This PR
introduces detecting if triggered within a SLURM job and issuing a
warning accordingly.

In principle, the plugin may work in job context. Submitting jobs from
jobs has always been a highlight of SLURM. However, settings may lead to
unintended behaviour (and would do so without Snakemake, presumably).
Hence, we can only warn from the executor.
cmeesters added a commit that referenced this issue Jul 5, 2024
Already, two issues (#75 and #22) seem to result from running Snakemake
in a SLURM job context and using this executor plugin. This PR
introduces detecting if triggered within a SLURM job and issuing a
warning accordingly.

In principle, the plugin may work in job context. Submitting jobs from
jobs has always been a highlight of SLURM. However, settings may lead to
unintended behaviour (and would do so without Snakemake, presumably).
Hence, we can only warn from the executor.
cmeesters added a commit that referenced this issue Jul 12, 2024
Already, two issues (#75 and #22) seem to result from running Snakemake
in a SLURM job context and using this executor plugin. This PR
introduces detecting if triggered within a SLURM job and issuing a
warning accordingly.

In principle, the plugin may work in job context. Submitting jobs from
jobs has always been a highlight of SLURM. However, settings may lead to
unintended behaviour (and would do so without Snakemake, presumably).
Hence, we can only warn from the executor.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants