Skip to content

Extensible pipeline tool for processing optical pooled screens data.

License

Notifications You must be signed in to change notification settings

cheeseman-lab/brieflow

Repository files navigation

Brieflow

Extensible pipeline tool for processing optical pooled screens data.

We are actively moving code from OpticalPooledScreens. Please check back for updates!

Definitions

Terms mentioned throughout the code and documentation include:

  • Brieflow library: Code in workflow/lib used to perform Brieflow processing. Used with Snakemake to run Brieflow steps.
  • Module: Used synonymously to refer to larger steps of the Brieflow pipeline. Example modules: preprocessing, sbs, phenotype.
  • Process: Refers to a smaller step within a module. Processes use scripts and Brieflow library code to complete tasks. Example processes in the preprocessing module: extract_metadata_sbs, convert_sbs, calculate_ic_sbs.

Project Structure

Brieflow is built on top of Snakemake. We follow the Snakemake structure guidelines with some exceptions. The Brieflow project structure is as follows:

workflow/
├── envs/ - Environment YAML files that describe dependencies for different modules.
├── lib/ - Brieflow library code used for performing Brieflow processing. Organized into module-specific, shared, and external code.
├── rules/ - Snakemake rule files for each module. Used to organize processses within each module with inputs, outputs, parameters, and script file location.
├── scripts/ - Python script files for processes called by modules. Organized into module-specific and shared code.
├── targets/ - Snakemake files used to define inputs and their mappings for each module. 
└── Snakefile - Main Snakefile used to call modules.

Brieflow runs as follows:

  • A user configure parameters in Jupyter notebooks to use the Brieflow library code correctly for their data.
  • A user runs the main Snakefile with bash scripts (locally or on an HPC).
  • The main Snakefile calls module-specific snakemake files with rules for each process.
  • Each process rule calls a script.
  • Scripts use the Brieflow library code to transform the input files defined in targets into the output files defined in targets.

Running Example Analysis

Brieflow is set up as a Snakemake workflow with user configuration between steps where necessary. Thus, a user must configure parameters between module steps with configuration notebooks. While each step's module has its own Conda environment (compiled by Snakemake at runtime), the notebooks all share a configuration environment.

We currently recommend creating a cloned version of Brieflow for each screen analysis with:

# change directory below to reflect location of a screen analysis project
cd screen_analysis_dir/
git clone https://github.com/cheeseman-lab/brieflow.git

See the steps below to set up the workflow/configuration environments and run your own analysis with Brieflow.

Note: We will soon release documentation on how to set up an analysis repo for working with Brieflow!

Set up workflow/configuration Conda environments

Configuring and running Brieflow requires two separate environments!

The modules share a base environment (brieflow_workflows) and each have their own Conda environments compiled by Snakemake at runtime (in workflow/envs). All notebooks share a configuration environment (brieflow_configuration).

Note: If large changes to Brieflow code are expected for a particular screen analysis, we recommend changing the names of the workflow/configuration environments to be screen-specific so development of this code does not affect other Brieflow runs. Change the name of the workflow and configuration environments in brieflow_workflows_env.yml and brieflow_configuration.yml.

Set up Brieflow workflows environment

Use the following commands to set up the brieflow_workflows Conda environment:

# create brieflow_workflows conda environment
conda env create --file=brieflow_workflows_env.yml
# activate brieflow_workflows conda environment
conda activate brieflow_workflows
# set conda installation to use strict channel priorities
conda config --set channel_priority strict

Set up Brieflow configuration environment

Use the following commands to set up the brieflow_configuration Conda environment:

# create brieflow_configuration conda environment
conda env create --file=brieflow_configuration_env.yml

HPC Integrations

The steps for running workflows currently include local and Slurm integration. To use the Slurm integration for Brieflow configure the Slurm resources in analysis/slurm/config.yaml. The slurm_partition and slurm_account in default-resources need to be configured while the other resource requirements have suggested values. These can be adjusted as necessary.

Note: Other Snakemake HPC integrations can be found in the Snakemake plugin catalog. Only the slurm plugin has been tested.

Analysis Steps

Follow the instructions below to configure parameters and run modules. All of these steps are done in the example analysis folder. Use the following command to enter this folder: cd analysis/.

Step 0: Configure preprocess parameters

Follow the steps in 0.configure_preprocess_params.ipynb to configure preprocess params.

Step 1: Run preprocessing module

Local:

conda activate brieflow_workflows
sh 1.run_preprocessing.sh

Slurm:

sbatch 1.run_preprocessing_slurm.sh

*Note: For testing purposes, users may only have generated sbs or phenotype images. It is possible to test only SBS/phenotype preprocessing in this notebook. See notebook instructions for more details.

Step 2: Configure SBS parameters

Follow the steps in 2.configure_sbs_params.ipynb to configure SBS module parameters.

Step 3: Configure phenotype parameters

Follow the steps in 3.configure_phenotype_params.ipynb to configure phenotype module parameters.

Step 4: Run SBS/phenotype modules

Local:

conda activate brieflow_workflows
sh 4.run_sbs_phenotype.sh

Slurm:

sbatch 4.run_sbs_phenotype_slurm.sh

Step 5: Configure merge process params

Follow the steps in 5.configure_merge_params.ipynb to configure merge process params.

Step 6: Run merge process

Local:

conda activate brieflow_workflows
sh 6.run_merge_process.sh

Slurm:

# TODO: Add and test this file
sbatch 6.run_merge_process_slurm.sh

Step 7: Configure aggregate process params

Follow the steps in 7.configure_aggregate_params.ipynb to configure aggregate process params.

Step 8: Run aggregate process

Local:

conda activate brieflow_workflows
sh 8.run_aggregate_process.sh

Slurm:

# TODO: Add and test this file
sbatch 8.run_aggregate_process_slurm.sh

Step 9: Configure cluster process params

Follow the steps in 9.configure_cluster_params.ipynb to configure cluster process params.

Step 10: Run cluster process

Local:

conda activate brieflow_workflows
sh 10.run_cluster_process.sh

Slurm:

# TODO: Add and test this file
sbatch 10.run_cluster_process_slurm.sh

*Note: Use brieflow_configuration Conda environment for each configuration notebook.

*Note: Many users will want to only run SBS or phenotype processing, independently. It is possible to restrict the SBS/phenotype processing with the following:

  1. If either of the sample dataframes defined in 0.configure_preprocess_params.ipynb are empty then no samples will be processed. See the notebook for more details.
  2. By varying the tags in the 4.run_sbs_phenotype sh files (--until all_sbs or --until all_phenotype), the analysis will only run only the analysis of interest.

Run Entire Analysis

If all parameter configurations are known for the entire Brieflow pipeline, it is possible to run the entire pipeline with the following:

conda activate brieflow_workflows
sh run_entire_analysis.sh
sbatch run_entire_analysis.sh

Example Analysis

The example analysis details an example Brieflow run with a small testing set of OPS data. We do not include the data necessary for this example analysis in this repo as it is too large. The data/ folder used for this example analysis can be downloaded from Google Drive and should be placed at example_analysis/data.

Contribution Notes

  • Brieflow is still actively under development and we welcome community use/development.
  • File a GitHub issue to share comments and issues.
  • File a GitHub PR to contribute to Brieflow as detailed in the pull request template. Read about how to contribute to a project to understand forks, branches, and PRs.

Dev Tools

We use ruff for linting and formatting code.

Conventions

We use the following conventions:

  • One sentence per line convention for markdown files
  • Google format for function docstrings
  • tsv file format for saving small dataframes that require easy readability
  • parquet file format for saving large dataframes
  • Data location information (well, tile, cycle, etc) + __ + type of information (cell features, phenotype info, etc) + . + file type. Data is stored in its respective analysis directories. For example: analysis_root/preprocess/metadata/phenotype/P-1_W-A2_T-571__metadata.tsv

About

Extensible pipeline tool for processing optical pooled screens data.

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published