Skip to content

Cache entries are reused even when the inputs differ #6513

@muffato

Description

@muffato

Bug report

As I was editing my pipeline and testing with -resume, I ended in a situation where an entire sub-worfklow was being skipped. I traced it back to a join not outputting anything because the meta maps of the two channels had different keys.
I had indeed change the key name at some point but what bothers me is that Nextflow -resume is returning the cached entries from processes with different inputs (here the meta map).

Expected behavior and actual behavior

A cache entry with a meta map should not be reused if the content of the meta map differs

Steps to reproduce the problem

In this minimal example, I create a channel named ch_genome with tuples made of:

  • a meta map that uses a key name chosen by the user,
  • a string symbolising a Fasta file.

The channel goes through the FAIDX process, which outputs similar tuples:

  • the input meta map as is
  • a string symbolising the faidx index

Then the input and output channels are join-ed on the assumption that the meta map remains the same.

workflow {

    Channel.of(1, 2, 3)
    | map { i -> [ [ 'id': i, "${params.key}": 100 * i * i], "seq_${i}.fa" ] } 
    | set { ch_genome }
    ch_genome.view()

    FAIDX ( ch_genome )
    FAIDX.out.fai.view()

    // join fasta with corresponding fai file
    ch_genome
    | join ( FAIDX.out.fai )
    | set { fasta_fai }
    fasta_fai.view()
}

process FAIDX {
    input:
    tuple val(meta), val(reads)
 
    output:
    tuple val(meta), val(index), emit: fai 
 
    script:
    index = "seq_${meta.id}.fai"
    """ 
    """
}

Program output

In the first run, I choose the key genome_size. You can see all three .view() printing 3 entries each.

$ nextflow run bug2/ --key genome_size
Nextflow 25.10.0 is available - Please consider updating your version to it

 N E X T F L O W   ~  version 25.04.6

Launching `bug2/main.nf` [friendly_kimura] DSL2 - revision: 43f5fd76a3

executor >  local (3)
[60/9c9378] process > FAIDX (3) [100%] 3 of 3 ✔
[[id:1, genome_size:100], seq_1.fa]
[[id:2, genome_size:400], seq_2.fa]
[[id:3, genome_size:900], seq_3.fa]
[[id:1, genome_size:100], seq_1.fai]
[[id:1, genome_size:100], seq_1.fa, seq_1.fai]
[[id:2, genome_size:400], seq_2.fai]
[[id:2, genome_size:400], seq_2.fa, seq_2.fai]
[[id:3, genome_size:900], seq_3.fai]
[[id:3, genome_size:900], seq_3.fa, seq_3.fai]

In the second run, I change the key name to total_length and run in -resume mode. FAIDX.out.fai.view() still prints the meta maps from the first runs, i.e. with genome_size as the key !
Then, join can't find anything in common and doesn't print anything.

$ nextflow run bug2/ --key total_length -resume
Nextflow 25.10.0 is available - Please consider updating your version to it

 N E X T F L O W   ~  version 25.04.6

Launching `bug2/main.nf` [special_shockley] DSL2 - revision: 43f5fd76a3

[60/9c9378] process > FAIDX (3) [100%] 3 of 3, cached: 3 ✔
[[id:1, total_length:100], seq_1.fa]
[[id:2, total_length:400], seq_2.fa]
[[id:3, total_length:900], seq_3.fa]
[[id:2, genome_size:400], seq_2.fai]
[[id:1, genome_size:100], seq_1.fai]
[[id:3, genome_size:900], seq_3.fai]

Environment

  • Nextflow version: 25.04.6 build 5954
  • Java version: Groovy 4.0.26 on OpenJDK 64-Bit Server VM 17.0.15+6-LTS
  • Operating system: Ubuntu 22.04.5 LTS Linux 5.15.0-141-generic
  • Bash version: 5.1.16(1)-release (x86_64-pc-linux-gnu)

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions