Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

QUAST storing sam, bam, sorted.bam #123

Open
ghost opened this issue Jan 21, 2020 · 2 comments
Open

QUAST storing sam, bam, sorted.bam #123

ghost opened this issue Jan 21, 2020 · 2 comments
Milestone

Comments

@ghost
Copy link

ghost commented Jan 21, 2020

Hello, I was wondering why does QUAST store at the same time the sam, bam and sorted.bam? It takes a huge lot of disk space. I tried the option --space-efficient but it still writes to the disk a sam, then a bam and then a sorted.bam. So basically the alignment is written 3 times to the disk.

Here is my command

./quast-5.0.2/quast.py --eukaryote --large --circos --pe1 $R1 --pe2 $R2 --pacbio ../allPB.fa --nanopore ../allONTvaga.fa --threads 24 -o quast_report shasta_final.fa --space-efficient

thank you

EDIT, is it because --space-efficient is wrongly placed as an argument? If so sorry ><

@ghost
Copy link
Author

ghost commented Jan 21, 2020

Actually it's not a question of the argument wrongly placed.
I also notice it seems to use only half the number of the specified thread count.

@mdondrup
Copy link

mdondrup commented Feb 15, 2024

I vote in support of this issue. The temporary storage required when analyzing raw reads appears excessive due to redundancy and may lead to most of the "No space left on device" errors. One example I ran into: I have a 12 Mbase genome and an assembly of the same size I would like to evaluate.

  • 3GB of nanopore reads (fastq.gz)
  • 16GB of illumina reads (fastq.gz)
  • The whole analysis directory < 100GB including several processed data and multiple assemblies.

The process maxed out at 500GB in the (when the disk ran full) quast temporary folder which contained:

  • copies of the input reads (fastq unzipped)
  • .sam+bam files of all the alignments + the sorted .sam files

I think this problem could be addressed with relative ease by deleting intermediate files (e.g. deleting sam files once bam files have been created) or using samtools via a unix pipe. From what I understand from the documentation --space-efficient refers to RAM requirements, not disk.

@alexeigurevich alexeigurevich added this to the 5.4 milestone Nov 14, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants