-
Notifications
You must be signed in to change notification settings - Fork 744
Description
Bug report
As I was editing my pipeline and testing with -resume, I ended in a situation where an entire sub-worfklow was being skipped. I traced it back to a join not outputting anything because the meta maps of the two channels had different keys.
I had indeed change the key name at some point but what bothers me is that Nextflow -resume is returning the cached entries from processes with different inputs (here the meta map).
Expected behavior and actual behavior
A cache entry with a meta map should not be reused if the content of the meta map differs
Steps to reproduce the problem
In this minimal example, I create a channel named ch_genome with tuples made of:
- a meta map that uses a key name chosen by the user,
- a string symbolising a Fasta file.
The channel goes through the FAIDX process, which outputs similar tuples:
- the input meta map as is
- a string symbolising the faidx index
Then the input and output channels are join-ed on the assumption that the meta map remains the same.
workflow {
Channel.of(1, 2, 3)
| map { i -> [ [ 'id': i, "${params.key}": 100 * i * i], "seq_${i}.fa" ] }
| set { ch_genome }
ch_genome.view()
FAIDX ( ch_genome )
FAIDX.out.fai.view()
// join fasta with corresponding fai file
ch_genome
| join ( FAIDX.out.fai )
| set { fasta_fai }
fasta_fai.view()
}
process FAIDX {
input:
tuple val(meta), val(reads)
output:
tuple val(meta), val(index), emit: fai
script:
index = "seq_${meta.id}.fai"
"""
"""
}Program output
In the first run, I choose the key genome_size. You can see all three .view() printing 3 entries each.
$ nextflow run bug2/ --key genome_size
Nextflow 25.10.0 is available - Please consider updating your version to it
N E X T F L O W ~ version 25.04.6
Launching `bug2/main.nf` [friendly_kimura] DSL2 - revision: 43f5fd76a3
executor > local (3)
[60/9c9378] process > FAIDX (3) [100%] 3 of 3 ✔
[[id:1, genome_size:100], seq_1.fa]
[[id:2, genome_size:400], seq_2.fa]
[[id:3, genome_size:900], seq_3.fa]
[[id:1, genome_size:100], seq_1.fai]
[[id:1, genome_size:100], seq_1.fa, seq_1.fai]
[[id:2, genome_size:400], seq_2.fai]
[[id:2, genome_size:400], seq_2.fa, seq_2.fai]
[[id:3, genome_size:900], seq_3.fai]
[[id:3, genome_size:900], seq_3.fa, seq_3.fai]
In the second run, I change the key name to total_length and run in -resume mode. FAIDX.out.fai.view() still prints the meta maps from the first runs, i.e. with genome_size as the key !
Then, join can't find anything in common and doesn't print anything.
$ nextflow run bug2/ --key total_length -resume
Nextflow 25.10.0 is available - Please consider updating your version to it
N E X T F L O W ~ version 25.04.6
Launching `bug2/main.nf` [special_shockley] DSL2 - revision: 43f5fd76a3
[60/9c9378] process > FAIDX (3) [100%] 3 of 3, cached: 3 ✔
[[id:1, total_length:100], seq_1.fa]
[[id:2, total_length:400], seq_2.fa]
[[id:3, total_length:900], seq_3.fa]
[[id:2, genome_size:400], seq_2.fai]
[[id:1, genome_size:100], seq_1.fai]
[[id:3, genome_size:900], seq_3.fai]
Environment
- Nextflow version: 25.04.6 build 5954
- Java version: Groovy 4.0.26 on OpenJDK 64-Bit Server VM 17.0.15+6-LTS
- Operating system: Ubuntu 22.04.5 LTS Linux 5.15.0-141-generic
- Bash version: 5.1.16(1)-release (x86_64-pc-linux-gnu)