-
Notifications
You must be signed in to change notification settings - Fork 12
Description
Hello,
I wanted to use aTRAM to iteratively assemble ultraconserved elements but encoutered an issue when using spades with this command:
atram.py -i 3 -Q refs/node1.alluce.fasta -b db/JHAR05885_03_raw -o results/ -a spades --evalue 1e-3 --word-size 11 --spades-careful --cpus 24 --spades-threads 24 --spades-memory 168
( 3332027) 2025-06-26 17:59:16.043894 INFO : ################################################################################
( 3332027) 2025-06-26 17:59:16.044007 INFO : aTRAM version: v2.4.4
( 3332027) 2025-06-26 17:59:16.044072 INFO : Python version: 3.13.5 | packaged by Anaconda, Inc. | (main, Jun 12 2025, 16:09:02) [GCC 11.2.0]
( 3332027) 2025-06-26 17:59:16.044112 INFO : /home/student/miniconda3/envs/aTRAM/bin/atram.py -i 3 -Q refs/node1.alluce.fasta -b db/JHAR05885_03_raw -o results/ -a spades --evalue 1e-3 --word-size 11 --spades-ca
reful --cpus 24 --spades-threads 24 --spades-memory 168
( 3332027) 2025-06-26 17:59:16.044186 INFO : aTRAM blast DB = "db/JHAR05885_03_raw", query = "node1.alluce_uce_11_Node1_1.fasta", iteration 1
( 3332027) 2025-06-26 17:59:16.044451 INFO : Blasting query against shards: iteration 1
( 3332027) 2025-06-26 17:59:20.479449 INFO : All 77 blast results completed
( 3332027) 2025-06-26 17:59:20.480523 INFO : 10 blast hits in iteration 1
( 3332027) 2025-06-26 17:59:20.480615 INFO : Writing assembler input files: iteration 1
( 3332027) 2025-06-26 17:59:20.485525 INFO : Assembling shards with spades: iteration 1
( 3332027) 2025-06-26 17:59:21.765160 ERROR: Exception: [Errno 2] No such file or directory: '/tmp/atram_onyy0kwy/JHAR05885_03_raw_node1.alluce_uce_11_Node1_1.fasta_01_7m1d5is7/spades/scaffolds.fasta'
( 3332027) 2025-06-26 17:59:21.765537 INFO : Writing 0 filtered contigs after iteration 1
( 3332027) 2025-06-26 17:59:21.765636 INFO : 0 total contigs after iteration 1
I could launch spades manually after running aTRAM with -a none option to do blast only, so issues came from the spades command. After digging into the code, I could find the issue and fix it by modifying a bit the "spades" and "post_assembly" functions in lib/assembler/spades.py
def spades(self):
"""Build the command for assembly."""
cmd = ['spades.py',
'--sc', ## added
'--only-assembler',
'--threads {}'.format(self.args['spades_threads']),
'--memory {}'.format(self.args['spades_memory']),
'--cov-cutoff {}'.format(self.args['spades_cov_cutoff']),
'-o {}'.format(self.work_path())]
if self.args['spades_careful']:
cmd.append('--careful')
if self.file['paired_count']:
cmd.append("-1 '{}'".format(self.file['paired_1'])) ## modified
cmd.append("-2 '{}'".format(self.file['paired_2'])) ## modified
if self.file['single_1_count']:
cmd.append("--s 1 '{}'".format(self.file['single_1'])) ## modified
if self.file['single_2_count']:
cmd.append("--s 2 '{}'".format(self.file['single_2'])) ## modified
if self.file['single_any_count']:
cmd.append("--s '{}'".format(self.file['single_any'])) ## modified
return ' '.join(cmd)
def post_assembly(self):
"""Copy the assembler output."""
src = join(self.work_path(), 'scaffolds.fasta') ## modified
shutil.move(src, self.file['output'])
It simply needed an update to deprecated options and file names, so it works normally now, except that I couldn't find exactly the same contigs I found when I manually ran spades.
UCE data can be relatively distant to the reference, thus contig filtration was too stringent after assembly and needed to be adjusted. I added a less stringent -word-size option of the blastn command from the function "against_contigs" in lib/blast.py:
def against_contigs(log, blast_db, query_file, hits_file, **kwargs):
"""Blast the query sequence against the contigs.
The blast output will have the scores for later processing.
"""
cmd = []
if kwargs['protein']:
cmd.append('tblastn')
cmd.append('-db_gencode {}'.format(kwargs['blast_db_gencode']))
else:
cmd.append('blastn')
cmd.append('-db {}'.format(blast_db))
cmd.append('-query {}'.format(query_file))
cmd.append('-out {}'.format(hits_file))
cmd.append('-outfmt 15 -word_size 11') ## modified
command = ' '.join(cmd)
log.subcommand(command, kwargs['temp_dir'], timeout=kwargs['timeout'])
A better way to do it would possibly be to paste the --word-size value given in the command line to that function as well.
I didn't feel like pulling a request for those minor changes but felt like it was important to share.