caporaso-lab · gregcaporaso · Oct 1, 2024 · Sep 24, 2024 · Sep 24, 2024 · Sep 26, 2024
diff --git a/book/_config.yml b/book/_config.yml
@@ -53,4 +53,17 @@ html:
 
 parse:
   myst_substitutions:
-    miniconda_url: "[Miniconda](https://conda.io/miniconda.html)"
+    miniconda_url: "[Miniconda](https://conda.io/miniconda.html)"
+    release_epoch: "2024.5"
+    tutorial_environment_block: |
+      ````{admonition} Reminder
+      :class: tip
+
+      These examples assume that you have a QIIME 2 deployment that includes the [q2-dwq2](https://github.com/caporaso-lab/q2-dwq2) educational plugin.
+      Follow the instructions in [](tutorial-setup) if you'd like to follow along with this tutorial.
+      If you've already followed those instructions, before following this tutorial be sure to activate your conda environment as follows:
+
+      ```python
+      conda activate using-qiime2
+      ```
+      ````
diff --git a/book/_toc.yml b/book/_toc.yml
@@ -4,18 +4,22 @@ parts:
  - caption: Tutorials
    chapters:
      - file: tutorials/intro
+     - file: tutorials/parallel-pipeline
  - caption: How-tos
    chapters:
      - file: how-to-guides/merge-metadata
      - file: how-to-guides/validate-metadata
      - file: how-to-guides/artifacts-as-metadata
      - file: how-to-guides/view-visualizations
+     - file: how-to-guides/pipeline-resumption
  - caption: Explanations
    chapters:
      - file: explanations/metadata
+     - file: explanations/types-of-parallelization
  - caption: References
    chapters:
      - file: references/metadata
+     - file: references/parallel-configuration
  - caption: Back matter
    chapters:
      - file: back-matter/glossary

diff --git a/book/back-matter/glossary.md b/book/back-matter/glossary.md
@@ -12,6 +12,12 @@ artifact
   When written to file, artifacts typically have the extension {term}`qza`.
   Artifacts can be provided as input to QIIME 2 {term}`actions <action>` or exported from QIIME 2 for use with other software.
 
+breaking change
+  A *breaking change* is a change to how a program works (for example, a QIIME 2 plugin or interface) that introduces an incompatibility with earlier versions of the program.
+  This will generally require that users make some modification to how they were using some aspect of a system.
+  For example, if a plugin method added a new required input in version 2, that would be a breaking change with respect to version 1: calling the method without that new parameter would fail in version 2, but would have succeeded with version 1.
+  This may also be called a backward incompatible change or an API change.
+
 DRY
   An acronym of *Don't Repeat Yourself*, and a critical principle of software engineering and equally applicable in research data management.
   For more information on DRY and software engineering in general, see {cite:t}`pragprog20`.
@@ -33,6 +39,11 @@ plugin
   As of this writing, a collection of plugins that are installed together are referred to as a distribution.
   Additional plugins can be installed, and the primary resource enabling discovery of additional plugins is the [QIIME 2 Library](https://library.qiime2.org).
 
+Python 3 API
+  QIIME 2's Application Programmer Interface.
+  This allows advanced users to access all QIIME 2 analytic functionality directly in Python.
+  This can be very convenient for developing tools that use QIIME 2 as a component, or for performing data analysis without writing intermediary data artifacts to disk unless you specifically want to.
+
 q2cli
   [q2cli](https://github.com/qiime2/q2cli) is the original (and still primary, as of March 2024) command line interface for QIIME 2.
 

diff --git a/book/explanations/metadata.md b/book/explanations/metadata.md
@@ -1,5 +1,5 @@
 (metadata-explanation)=
-# Metadata in QIIME 2
+# Sample and feature metadata
 
 Metadata provides the key to gaining biological insight from your data.
 In QIIME 2, **sample metadata** may include technical details, such as the DNA barcodes that were used for each sample in a multiplexed sequencing run, or descriptions of the samples, such as which subject, time point, and body site each sample came from in a human microbiome time series.

diff --git a/book/explanations/types-of-parallelization.md b/book/explanations/types-of-parallelization.md
@@ -0,0 +1,14 @@
+(types-of-parallel-support)=
+# Types of parallel computing support
+
+## Parallel Pipeline execution
+
+QIIME 2's formal parallel computing support uses [Parsl](https://parsl.readthedocs.io/en/stable/1-parsl-introduction.html>), and enables parallel execution of QIIME 2 {term}`Pipeline` actions.
+All QIIME 2 `Pipelines` will have parallel computing options, notably the `--parallel` parameter in {term}`q2cli`, though whether those actually induce parallel computing is up to the implementation of the `Pipeline`.
+Actions using this formal parallel computing support can make use of high-performance computing hardware that doesn't necessarily have shared memory.
+
+## Informal parallel support
+
+Some {term}`Method` actions (e.g., `qiime dada2 denoise-*`) wrap multi-threaded applications and may define a parameter (like `--p-n`) that gives the user control over that.
+The QIIME 2 parameter type associated with these parameters should always be `NTHREADS` or `NJOBS` (if you observe a parameter where this isn't the case, it was probably an error on the developers part - reach out on the forum to let us know).
+Actions using this informal parallel computing support are generally restricted to running on systems with shared memory.
diff --git a/book/how-to-guides/artifacts-as-metadata.md b/book/how-to-guides/artifacts-as-metadata.md
@@ -1,5 +1,5 @@
 (view-artifacts-as-metadata)=
-# How to use QIIME 2 Artifacts as Metadata
+# How to use Artifacts as Metadata
 
 In addition to TSV metadata files, QIIME 2 also supports viewing some kinds of artifacts as metadata.
 An example of this is artifacts of type `SampleData[AlphaDiversity]`.

diff --git a/book/how-to-guides/pipeline-resumption.md b/book/how-to-guides/pipeline-resumption.md
@@ -0,0 +1,35 @@
+(pipeline-resumption)=
+# How to resume failed Pipeline runs
+
+If a {term}`Pipeline` fails at some point during its execution, and you rerun it, QIIME 2 can attempt to reuse the results that were calculated by the `Pipeline` before it failed.
+
+## Pipeline resumption through the command line interface (CLI)
+
+By default, when you run a {term}`Pipeline` on the CLI, QIIME 2 will create a pool in its cache (either the default cache, or the cache specified using the `--use-cache` parameter).
+This pool will named based on the scheme: `recycle_<plugin>_<action>_<sha1('plugin_action')>`.
+This pool will store all intermediate {term}`Results <result>` created by the {term}`Pipeline`.
+
+Should the `Pipeline` run succeed, this pool will be removed.
+However, should the `Pipeline` run fail, you can rerun the `Pipeline` using the same command you ran the first time, and the intermediate {term}`Results <result>` stored in the pool will be reused to avoid redoing steps in the Pipeline that had already completed.
+
+If you wish to specify the pool that you would like QIIME 2 should use, either on a `Pipeline`'s first run or on a resumption, you can specify the pool using the `--recycle-pool` option, followed by the name of the pool you wish to use.
+This pool will be created in the cache if it does not already exist.
+The `--no-recycle` flag may be passed if you do not want QIIME 2 to attempt to recycle any past {term}`Results <result>` or to save its {term}`Results <result>` from this run for future reuse.
+
+It is not necessarily possible to reuse prior {term}`Results <result>` if your inputs to the `Pipeline` differ on resumption with respect to what was provided on the initial run.
+In this situation, QIIME 2 will still try to reuse any {term}`Results <result>` that are not dependent on the inputs that changed, but there is no guarantee any will be usable.
+
+## Pipeline resumption through the Python 3 API
+
+When using the Python API, pools are specified using context managers (i.e., using Python's `with` statement).
+If you don't want to enable resumption, don't use the context manager.
+
+```python
+from qiime2.core.cache import Cache
+
+cache = Cache('cache_path')
+pool = cache.create_pool('pool', reuse=True)
+
+with pool:
+    # run your pipeline here
+```
diff --git a/book/how-to-guides/view-visualizations.md b/book/how-to-guides/view-visualizations.md
@@ -1,5 +1,5 @@
 (view-visualizations)=
-# How to view QIIME 2 Visualizations
+# How to view Visualizations
 
 ## QIIME 2 View
 

diff --git a/book/references/parallel-configuration.md b/book/references/parallel-configuration.md
@@ -0,0 +1,183 @@
+(parallel-configuration)=
+# Parallel Pipeline configuration
+
+QIIME 2 provides formal support for parallel computing of {term}`Pipelines <pipeline>` through [Parsl](https://parsl.readthedocs.io/en/stable/1-parsl-introduction.html>).
+
+## Parsl configuration
+
+A [Parsl configuration](https://parsl.readthedocs.io/en/stable/userguide/configuring.html) tells Parsl what resources are available and how to use them, and is required to use Parsl.
+The [Parsl documentation](https://parsl.readthedocs.io/en/stable/) provides full detail on [Parsl configuration](https://parsl.readthedocs.io/en/stable/userguide/configuring.html#).
+
+In the context of QIIME 2, Parsl configuration information is maintained in a QIIME 2 configuration file.
+QIIME 2 configuration files are stored on disk in [TOML](https://toml.io/en/) files.
+
+### Default Parsl configuration
+
+For basic multi-processor usage, QIIME 2 writes a default configuration file the first time it's needed (e.g., if you instruct QIIME 2 to execute in parallel without a particular configuration).
+
+The default `qiime2_config.toml` file, as of QIIME 2 2024.10, looks like the following:
+
+(default-parsl-configuration-file)=
+```
+[parsl]
+strategy = "None"
+
+[[parsl.executors]]
+class = "ThreadPoolExecutor"
+label = "tpool"
+max_threads = ...
+
+[[parsl.executors]]
+class = "HighThroughputExecutor"
+label = "default"
+max_workers = ...
+
+[parsl.executors.provider]
+class = "LocalProvider"
+```
+
+When this file is written to disk, the `max_threads` and `max_workers` values (represented above by `...`) are computed by QIIME 2 as one less than the CPU count on the computer where it is running (`max(psutil.cpu_count() - 1, 1)`).
+
+This configuration defines two `Executors`.
+
+1. The [`ThreadPoolExecutor`](https://parsl.readthedocs.io/en/stable/stubs/parsl.executors.ThreadPoolExecutor.html?highlight=Threadpoolexecutor) that parallelizes jobs across multiple threads in a process.
+2. The [`HighThroughputExecutor`](https://parsl.readthedocs.io/en/stable/stubs/parsl.executors.HighThroughputExecutor.html?highlight=HighThroughputExecutor) that parallelizes jobs across multiple processes.
+
+In this case, the `HighThroughputExecutor` is designated as the default by nature of it's `label` value being `default`.
+Your parsl configuration **must** define an executor with the label `default`, and this is the executor that QIIME 2 will use to dispatch your jobs to if you do not specify an alternative.
+
+````{admonition} The parsl.Config object
+:class: tip
+
+This parsl configuration is ultimately read into a `parsl.Config` object internally in QIIME 2.
+The `parsl.Config` object that corresponds to the above example would look like the following:
+
+```python
+config = parsl.Config(
+    executors=[
+        ThreadPoolExecutor(
+            label='tpool',
+            max_threads=... # will be an integer value
+        ),
+        HighThroughputExecutor(
+            label='default',
+            max_workers=..., # will be an integer value
+            provider=LocalProvider()
+        )
+    ],
+    strategy=None
+)
+```
+````
+
+### Parsl configuration, line-by-line
+
+This first line of [the default configuration file presented above](default-parsl-configuration-file) indicates that this is the parsl section (or [table](https://toml.io/en/v1.0.0#table), to use TOML's terminology) of our configuration file.
+
+```
+[parsl]
+```
+
+The next line:
+
+```
+strategy = "None"
+```
+
+is a top-level Parsl configuration parameter that you can [read more about in the Parsl documentation](https://parsl.readthedocs.io/en/stable/userguide/configuring.html#multi-threaded-applications).
+This may need to be set differently depending on your system.
+
+If you were to load this into Python using tomlkit you would get the following dictionary:
+
+Next, the first executor is added.
+
+```
+[[parsl.executors]]
+class = "ThreadPoolExecutor"
+label = "tpool"
+max_threads = 7
+```
+
+The double square brackets (`[[ ... ]]`) indicates that [this is an array](https://toml.io/en/v1.0.0#array-of-tables), `executors`, that is nested under the `parsl` table.
+`class` indicates the specific parsl class that is being configured ([`parsl.executors.ThreadPoolExecutor`](https://parsl.readthedocs.io/en/stable/stubs/parsl.executors.ThreadPoolExecutor.html#parsl.executors.ThreadPoolExecutor) in this case); `label` provides a label that you can use to refer to this executor elsewhere; and `max_threads` is a configuration value for the ThreadPoolExecutor class which corresponds to a parameter name for the class's constructor.
+In this example a value of 7 is specified for `max_threads`, but as noted above this will be computed specifically for your machine when this file is created.
+
+Parsl's `ThreadPoolExecutor` runs on a single node, so we provide a second executor which can utilize up to 2000 nodes.
+
+```
+[[parsl.executors]]
+class = "HighThroughputExecutor"
+label = "default"
+max_workers = 7
+
+[parsl.executors.provider]
+class = "LocalProvider"
+```
+
+The definition of this executor, [`parsl.executors.HighThroughputExecutor`](https://parsl.readthedocs.io/en/stable/stubs/parsl.executors.HighThroughputExecutor.html#parsl.executors.HighThroughputExecutor), looks similar to the definition of the `ThreadPoolExecutor`, but it additionally defines a `provider`.
+The provider class provides access to computational resources.
+In this case, we use [`parsl.providers.LocalProvider`](https://parsl.readthedocs.io/en/stable/stubs/parsl.providers.LocalProvider.html), which provides access to local resources (i.e., on the laptop or workstation).
+[Other providers are available as well](https://parsl.readthedocs.io/en/stable/reference.html#providers), including for Slurm, Amazon Web Services, Kubernetes, and more.
+
+### Mapping {term}`Actions <action>` to executors
+
+An executor mapping can be added to your parsl configuration that defines which actions should run on which executors.
+If an action is unmapped, it will run on the default executor.
+This can be specified as follows:
+
+```
+[parsl.executor_mapping]
+action_name = "tpool"
+```
+
+```{warning}
+The mechanism for specifying action names at present does not handle the case of different plugins defining actions with the same name.
+This mechanism will likely change soon, and may be a {term}`breaking change`.
+You can track progress on this [here](https://github.com/qiime2/qiime2/issues/802).
+```
+
+(view-parsl-configuration)=
+### Viewing the current configuration
+
+Using {term}`q2cli`, you can see your current `qiime2_config.toml` file by running:
+
+```shell
+qiime info --config-level 2
+```
+
+(qiime2-configuration-precedence)=
+### QIIME 2 configuration file precedence
+
+When QIIME 2 needs configuration information, the following precedence order is followed to load a configuration file:
+
+1. The path specified in the environment variable `$QIIME2_CONFIG`.
+2. The file at `<user_config_dir>/qiime2/qiime2_config.toml`
+3. The file at `<site_config_dir>/qiime2/qiime2_config.toml`
+4. The file at `$CONDA_PREFIX/etc/qiime2_config.toml`
+
+If no configuration is found after checking those four locations, QIIME 2 writes a default configuration file to `$CONDA_PREFIX/etc/qiime2_config.toml` and uses that.
+This implies that after your first time running QIIME 2 in parallel without a config in at least one of the first 3 locations, the path referenced in step 4 will exist and contain a configuration file.
+
+Alternatively, when using {term}`q2cli`, you can provide a specific configuration for use in configuring parsl using the `--parallel-config` option.
+If provided, this overrides the priority order above.
+
+````{admonition} user_config_dir and site_config_dir
+:class: note
+On Linux, `user_config_dir` will usually be `$HOME/.config/qiime2/`.
+On macOS, it will usually be `$HOME/Library/Application Support/qiime2/`.
+
+You can get find the directory used on your system by running the following command:
+
+```bash
+python -c "import appdirs; print(appdirs.user_config_dir('qiime2'))"
+```
+
+On Linux `site_config_dir` will usually be something like `/etc/xdg/qiime2/`, but it may vary based on Linux distribution.
+On macOS it will usually be `/Library/Application Support/qiime2/`.
+
+You can get find the directory used on your system by running the following command:
+
+```bash
+python -c "import appdirs; print(appdirs.site_config_dir('qiime2'))"
+```
+````