Disk usage with Metaquast #106

cmorganl · 2019-07-10T23:55:44Z

Hi, I'm trying to run Metaquast on a mock community metagenome of 8 organims. The assembly and combined reference genomes are ~70Mbp each. I have 66Gb of uncompressed sequence reads in my forward and reverse FASTQ files. Everything runs well for most of the pipeline but disk usage gets out of control when it begins running quast per reference. With all of the SAM and BAM files I have ~250Gb for each reference its running in parallel, which is currently just 4, so 1Tb of space is being used.

Would it make more sense to instead run quast sequentially for each reference, giving the single quast command all the threading capacity specified in the Metaquast command? I think this model would scale much better if metagenomic samples had tens or hundreds of organisms in them than the current parallel model, even if it is slightly less efficient.

Alternatively, a flag (--cleanup?) could be selected to remove intermediate files (e.g. sam, all.correct.sam, bam) as it runs? This will of course restrict the potential to continue from a failed run but at least there won't be Tbs of SAM files sitting around.

Thanks!

colinvwood mentioned this issue Apr 13, 2023

evaluate-contigs fails with OSError: [Errno 28] No space left on device bokulich-lab/q2-assembly#39

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Disk usage with Metaquast #106

Disk usage with Metaquast #106

cmorganl commented Jul 10, 2019

Disk usage with Metaquast #106

Disk usage with Metaquast #106

Comments

cmorganl commented Jul 10, 2019