Parallelize analysis sections #400

adamcantor22 · 2022-05-18T14:06:02Z

Is your feature request related to a problem? Please describe.
The current solution for multiple demux/denoise runs per analysis is to have them run in series. This is quite inefficient for larger studies, and may need to be submitted to -q long in order to run successfully.

Describe the solution you'd like
These should be able to be done in parallel, probably using a qiime1-esque solution in which output files are checked for. When all the output files exist, the main job can be started which will merge all the sub-components. This solution should be generalized enough that it could potentially be used for other parallelization (e.g. with a new ANCOM implementation #386).

Describe alternatives you've considered
We briefly discussed multi-threading, but quickly dismissed it as it would raise our code's complexity by quite a lot.

The text was updated successfully, but these errors were encountered:

cleme · 2022-05-18T14:29:06Z

Q1 used to have a solution based on: a job is submitted spawning worker sub-jobs, which do the computation, while the main job remains waiting until all output files have been created. Details here:

https://github.com/biocore/qiime/tree/master/qiime/parallel

poller.py and util.py have most of the functionality that we would require. This solution is not ideal, because when worker jobs do not complete, there is no way for the main job to "know" the files won't be created and it keeps waiting until it hits walltime. It might be worth to review how Q2 implements parallelization.

adamcantor22 · 2023-12-01T16:52:10Z

While "full" parallelization is a challenging issue, there are a number of simple changes we could make to parallelize sections. This includes parallelizing differential abundance testing, taxa summarizing, and most importantly, demux/denoising. When there are many sequencing runs in a study, this step is much more serialized than it needs to be. Each run imports the fastqs to qiime artifact, demuxes, and denoises sequentially, then moves to the next run. These individual steps can be safely run in parallel across all runs. I.e., all fastq imports run in parallel, then all demuxes run in parallel, then all denoises. This will significantly speed these runs up. It may be challenging to do this when working with runs of different types (e.g. single vs dual barcodes) but at least, this can be applied to runs of the same type.

adamcantor22 · 2024-07-17T18:13:20Z

superceded by snakemake, which has this functionality #457

adamcantor22 added Server Issue relates to the server AnalysisTools labels May 18, 2022

adamcantor22 added this to the 0.9.0 milestone May 18, 2022

cleme modified the milestones: 0.9.0, 1.0.0 May 18, 2022

adamcantor22 mentioned this issue Jun 16, 2022

Parallelizing downstream analysis #413

Closed

adamcantor22 modified the milestones: 0.10.0, 0.12.0 Oct 12, 2022

adamcantor22 closed this as completed Jul 17, 2024

adamcantor22 mentioned this issue Aug 8, 2024

Enhancement snakemake analysis #472

Open

3 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Parallelize analysis sections #400

Parallelize analysis sections #400

adamcantor22 commented May 18, 2022

cleme commented May 18, 2022

adamcantor22 commented Dec 1, 2023 •

edited

Loading

adamcantor22 commented Jul 17, 2024

Parallelize analysis sections #400

Parallelize analysis sections #400

Comments

adamcantor22 commented May 18, 2022

cleme commented May 18, 2022

adamcantor22 commented Dec 1, 2023 • edited Loading

adamcantor22 commented Jul 17, 2024

adamcantor22 commented Dec 1, 2023 •

edited

Loading