Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

metaquast does not scale for large number of references #136

Open
nick-youngblut opened this issue Apr 13, 2020 · 0 comments
Open

metaquast does not scale for large number of references #136

nick-youngblut opened this issue Apr 13, 2020 · 0 comments

Comments

@nick-youngblut
Copy link

This is mainly to warn users:

Running metaQUAST (QUAST 5.0.2) with >500-1000 bacterial genome references & >1 million Illumina reads takes many days to finish, even when using many threads (eg., >7 days to finished for 2000 ref genomes with 2 mil metagenome reads, with 12 threads).

This lack of scaling seems to mainly be due to running QUAST on each ref genome separately. I tried to split metaquast into 2 separate steps: 1) everything up to the per-ref QUAST runs 2) the per-ref QUAST runs and all the final sections. This would allow for running the per-ref QUAST runs as separate cluster jobs (eg., 1000's of QUAST jobs in parallel). However, the metaquast.py is so intertwined that creating module sections of the code would likely require a full re-write. One major issue is the fact that qconfig cannot be pickled. Instead of explicit arg passing to functions, qconfig is used all over the place, which causes all kinds of headaches.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant