-
Notifications
You must be signed in to change notification settings - Fork 1
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Parallelize analysis sections #400
Comments
Q1 used to have a solution based on: a job is submitted spawning worker sub-jobs, which do the computation, while the main job remains waiting until all output files have been created. Details here: https://github.com/biocore/qiime/tree/master/qiime/parallel
|
While "full" parallelization is a challenging issue, there are a number of simple changes we could make to parallelize sections. This includes parallelizing differential abundance testing, taxa summarizing, and most importantly, demux/denoising. When there are many sequencing runs in a study, this step is much more serialized than it needs to be. Each run imports the fastqs to qiime artifact, demuxes, and denoises sequentially, then moves to the next run. These individual steps can be safely run in parallel across all runs. I.e., all fastq imports run in parallel, then all demuxes run in parallel, then all denoises. This will significantly speed these runs up. It may be challenging to do this when working with runs of different types (e.g. single vs dual barcodes) but at least, this can be applied to runs of the same type. |
superceded by snakemake, which has this functionality #457 |
Is your feature request related to a problem? Please describe.
The current solution for multiple demux/denoise runs per analysis is to have them run in series. This is quite inefficient for larger studies, and may need to be submitted to
-q long
in order to run successfully.Describe the solution you'd like
These should be able to be done in parallel, probably using a qiime1-esque solution in which output files are checked for. When all the output files exist, the main job can be started which will merge all the sub-components. This solution should be generalized enough that it could potentially be used for other parallelization (e.g. with a new ANCOM implementation #386).
Describe alternatives you've considered
We briefly discussed multi-threading, but quickly dismissed it as it would raise our code's complexity by quite a lot.
The text was updated successfully, but these errors were encountered: