Skip to content

Error running spades 4.2.0 and extracting UCE data [Fixed] #321

@bjzelvelder

Description

@bjzelvelder

Hello,

I wanted to use aTRAM to iteratively assemble ultraconserved elements but encoutered an issue when using spades with this command:
atram.py -i 3 -Q refs/node1.alluce.fasta -b db/JHAR05885_03_raw -o results/ -a spades --evalue 1e-3 --word-size 11 --spades-careful --cpus 24 --spades-threads 24 --spades-memory 168

( 3332027) 2025-06-26 17:59:16.043894 INFO : ################################################################################
( 3332027) 2025-06-26 17:59:16.044007 INFO : aTRAM version: v2.4.4
( 3332027) 2025-06-26 17:59:16.044072 INFO : Python version: 3.13.5 | packaged by Anaconda, Inc. | (main, Jun 12 2025, 16:09:02) [GCC 11.2.0]
( 3332027) 2025-06-26 17:59:16.044112 INFO : /home/student/miniconda3/envs/aTRAM/bin/atram.py -i 3 -Q refs/node1.alluce.fasta -b db/JHAR05885_03_raw -o results/ -a spades --evalue 1e-3 --word-size 11 --spades-ca
reful --cpus 24 --spades-threads 24 --spades-memory 168
( 3332027) 2025-06-26 17:59:16.044186 INFO : aTRAM blast DB = "db/JHAR05885_03_raw", query = "node1.alluce_uce_11_Node1_1.fasta", iteration 1
( 3332027) 2025-06-26 17:59:16.044451 INFO : Blasting query against shards: iteration 1
( 3332027) 2025-06-26 17:59:20.479449 INFO : All 77 blast results completed
( 3332027) 2025-06-26 17:59:20.480523 INFO : 10 blast hits in iteration 1
( 3332027) 2025-06-26 17:59:20.480615 INFO : Writing assembler input files: iteration 1
( 3332027) 2025-06-26 17:59:20.485525 INFO : Assembling shards with spades: iteration 1
( 3332027) 2025-06-26 17:59:21.765160 ERROR: Exception: [Errno 2] No such file or directory: '/tmp/atram_onyy0kwy/JHAR05885_03_raw_node1.alluce_uce_11_Node1_1.fasta_01_7m1d5is7/spades/scaffolds.fasta'
( 3332027) 2025-06-26 17:59:21.765537 INFO : Writing 0 filtered contigs after iteration 1
( 3332027) 2025-06-26 17:59:21.765636 INFO : 0 total contigs after iteration 1

I could launch spades manually after running aTRAM with -a none option to do blast only, so issues came from the spades command. After digging into the code, I could find the issue and fix it by modifying a bit the "spades" and "post_assembly" functions in lib/assembler/spades.py

    def spades(self):
        """Build the command for assembly."""
        cmd = ['spades.py',
               '--sc',                                                      ## added
               '--only-assembler',
               '--threads {}'.format(self.args['spades_threads']),
               '--memory {}'.format(self.args['spades_memory']),
               '--cov-cutoff {}'.format(self.args['spades_cov_cutoff']),
               '-o {}'.format(self.work_path())]

        if self.args['spades_careful']:
            cmd.append('--careful')

        if self.file['paired_count']:
            cmd.append("-1 '{}'".format(self.file['paired_1']))             ## modified
            cmd.append("-2 '{}'".format(self.file['paired_2']))             ## modified

        if self.file['single_1_count']:
            cmd.append("--s 1 '{}'".format(self.file['single_1']))          ## modified
        if self.file['single_2_count']:
            cmd.append("--s 2 '{}'".format(self.file['single_2']))          ## modified
        if self.file['single_any_count']:
            cmd.append("--s '{}'".format(self.file['single_any']))          ## modified

        return ' '.join(cmd)

    def post_assembly(self):
        """Copy the assembler output."""
        src = join(self.work_path(), 'scaffolds.fasta')                    ## modified
        shutil.move(src, self.file['output'])

It simply needed an update to deprecated options and file names, so it works normally now, except that I couldn't find exactly the same contigs I found when I manually ran spades.
UCE data can be relatively distant to the reference, thus contig filtration was too stringent after assembly and needed to be adjusted. I added a less stringent -word-size option of the blastn command from the function "against_contigs" in lib/blast.py:

def against_contigs(log, blast_db, query_file, hits_file, **kwargs):
    """Blast the query sequence against the contigs.

    The blast output will have the scores for later processing.
    """
    cmd = []

    if kwargs['protein']:
        cmd.append('tblastn')
        cmd.append('-db_gencode {}'.format(kwargs['blast_db_gencode']))
    else:
        cmd.append('blastn')

    cmd.append('-db {}'.format(blast_db))
    cmd.append('-query {}'.format(query_file))
    cmd.append('-out {}'.format(hits_file))
    cmd.append('-outfmt 15 -word_size 11')                                 ## modified

    command = ' '.join(cmd)
    log.subcommand(command, kwargs['temp_dir'], timeout=kwargs['timeout'])

A better way to do it would possibly be to paste the --word-size value given in the command line to that function as well.

I didn't feel like pulling a request for those minor changes but felt like it was important to share.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions