Template repository for storing processing optical pooled screen data with Brieflow.
Notes:
- Read brieflow.readthedocs.io before starting to get a good grasp of brieflow and brieflow-analysis!
- We aim to keep brieflow-related issues in the main brieflow repository (here).
- Join the brieflow Discord to ask questions, share ideas, and get help from other users and developers.
This repository is designed to work with Brieflow to analyze optical pooled screens. Follow these steps to get set up for a screen analysis!
Brieflow-analysis is a template for each screen analysis. Create a new respository for a screen to get started.
- Create a new screen repository wih the "Use this template" button for each new screen analysis.
- Clone the newly created repository to your local machine:
git clone https://github.com/YOUR-USERNAME/YOUR-SCREEN-REPO.git
cd YOUR-SCREEN-REPO
See the GitHub documentation for using a template for more information.
We use brieflow to process data on a very large scale from each screen.
We use brieflow as a git submodule in this repository.
Please see the Git Submodules basic explanation for information on how to best install, use, and update this submodule.
We recommend using a forked version of brieflow and provide instructions for doing this below.
We highly recommend reading the GitHub documentation for explanation of forks to understand how your fork of brieflow syncs with the cheeseman-lab
brieflow.
From the documentation:
A fork is a new repository that shares code and visibility settings with the original “upstream” repository. Forks are often used to iterate on ideas or changes before they are proposed back to the upstream repository, such as in open source projects or when a user does not have write access to the upstream repository. For more information, see Working with forks.
To get started:
-
Create a fork of brieflow as described here.
-
Clone the brieflow package into this repo using the following git submodule commands:
# enter brieflow dir
cd brieflow/
# set url to forked brieflow
git submodule set-url brieflow https://github.com/YOUR-USERNAME/brieflow.git
# init submodule
git submodule update --init --recursive
- Configure the remote repository for your fork (more info here).
# set remote upstream repo
git remote add upstream https://github.com/cheeseman-lab/brieflow.git
# check origin and upstream repos
git remote -v
# Confirm you see the below output
> origin https://github.com/YOUR-USERNAME/brieflow.git (fetch)
> origin https://github.com/YOUR-USERNAME/brieflowgit (push)
> upstream https://github.com/cheeseman-lab/brieflow.git (fetch)
> upstream https://github.com/cheeseman-lab/brieflow.git (push)
Follow the GitHub documentation to to sync changes between your fork and cheeseman-lab/brieflow
(ex, to pull a new branch).
- Set up brieflow following the setup instructions.
Use the following commands to set up the brieflow Conda environment (~10 min):
# enter breiflow dir
cd brieflow/
# create and activate brieflow_SCREEN_NAME conda environment
# NOTE: replace brieflow_SCREEN_NAME with the name of your screen to ensure a screen-specific installation
# using this screen-specific installation will refer to library code in ./brieflow/workflow/lib
conda create -n brieflow_SCREEN_NAME -c conda-forge python=3.11 uv pip -y
conda activate brieflow_SCREEN_NAME
# install external packages
uv pip install -r pyproject.toml
# install editable version of brieflow
uv pip install -e .
# install conda-only packages
conda install -c conda-forge micro_sam -y # skip if not using micro-sam for segmentation
Notes:
- We recommend a screen-specific installation because changes to this particular
./brieflow/workflow/lib
code will live within this specific installation of brieflow, and an explicit name helps keep track of different brieflow installations. One could also install one version of brieflow that is used across brieflow-analysis repositories. - For a rule-specific package consider creating a separate conda environment file and using it for the particular rule as described in the Snakemake integrated package management notes.
We use the HPC integration for Slurm as detailed in the setup instructions. To use the Slurm integration for Brieflow configure the Slurm resources in analysis/slurm/config.yaml.
- Optional: Contribute back to brieflow:
Track changes to computational processing in a new branch on your fork.
Contribute these changes to cheeseman-lab/brieflow
with a pull request.
See GitHub's documentation for contributing to a project and brieflow's contribution notes for more info.
Run the following commands to ensure your Brieflow is set up correctly: This will test Brieflow on a small limited subset of example data that we provide, and functions only with the main branch of Brieflow. This is not the optimal location for analyzing your data.
# activate brieflow env
conda activate brieflow_SCREEN_NAME
# set up small test analysis
cd brieflow/tests/small_test_analysis
python small_test_analysis_setup.py
# run brieflow
sh run_brieflow.sh
# run tests
cd ../../
pytest
Note: Before beginning analysis, it is strongly recommended that you fill out the screen.yaml
file to track all of your experimental metadata.
analysis/
contains configuration notebooks used to configure processes and slurm scripts used to run full modules.
By default, results are output to analysis/brieflow_output
and organized by analysis module (preprocess, sbs, phenotype, etc).
Follow the full instructions below to run an analysis.
Follow the instructions below to configure parameters and run modules. All of these steps are done in the example analysis. Use the following commands to enter this folder and activate the conda env:
# enter analysis directory
cd analysis/
# activate brieflow_main_env conda environment
conda activate brieflow_SCREEN_NAME
*Notes:
- Use
brieflow_SCREEN_NAME
Conda environment for each configuration notebook. - How you use
brieflow
should depend on your workload.- Runs that can be done with local compute can be run with the
.sh
scripts, which are set up to run all rules for a module. Note that these scripts are currently set up to do a dry run with the-n
parameter, which will need to be removed for a local run`. - Runs that need HPC compute should be run with the
_slurm.sh
scripts. Right now, these are set up to log run information and break the larger steps (preprocessing, sbs, phenotype) into plate-level runs. The local.sh
scripts can still be used to do a dry run preview with-n
(already set up).
- Runs that can be done with local compute can be run with the
Follow the steps in 0.configure_preprocess_params.ipynb to configure preprocess params.
Note: This step determines where ND2 data is loaded from (can be from anywhere) and where intermediate/output data is saved (can also be anywhere).
By default, results are output to analysis/brieflow_output
.
Local:
sh 1.run_preprocessing.sh
Slurm:
Change NUM_PLATES
in 1.run_preprocessing_slurm.sh to the number of plates you are processing (to process each plate separately).
# start a tmux session:
tmux new-session -s preprocessing
# in the tmux session:
bash 1.run_preprocessing_slurm.sh
*Note: For testing purposes, users may only have generated sbs or phenotype images. It is possible to test only SBS/phenotype preprocessing in this notebook. See notebook instructions for more details.
Follow the steps in 2.configure_sbs_params.ipynb to configure SBS module parameters.
Follow the steps in 3.configure_phenotype_params.ipynb to configure phenotype module parameters.
Local:
sh 4.run_sbs_phenotype.sh
Slurm:
Change NUM_PLATES
4a.run_sbs_slurm.sh and 4b.run_phenotype_slurm.sh to the number of plates you are processing (to process each plate separately).
These two modules can be run simultaneously or separately.
# start a tmux session:
tmux new-session -s sbs_phenotype
# in the tmux session:
bash 4a.run_sbs_slurm.sh
bash 4b.run_phenotype_slurm.sh
Follow the steps in 5.configure_merge_params.ipynb to configure merge process params.
Local:
sh 6.run_merge.sh
Slurm:
# start a tmux session:
tmux new-session -s merge
# in the tmux session:
bash 6.run_merge_slurm.sh
Follow the steps in 7.configure_aggregate_params.ipynb to configure aggregate process params.
Local:
sh 8.run_aggregate.sh
Slurm:
# start a tmux session:
tmux new-session -s aggregate
# in the tmux session:
bash 8.run_aggregate_slurm.sh
Follow the steps in 9.configure_cluster_params.ipynb to configure cluster process params.
Local:
sh 10.run_cluster.sh
Slurm:
# start a tmux session:
tmux new-session -s cluster
# in the tmux session:
bash 10.run_cluster_slurm.sh
Run the 11.analyze.ipynb notebook to evaluate the biological relevance of your clusters using a LLM wrapper and to generate simple plots of your features.
Brieflow includes a native visualizer for a screen's experimental overview, analysis overview, quality control, and cluster analysis. Run the following command to start this visualization:
sh 12.run_visualization.sh
*Note: Many users will want to only run SBS or phenotype processing, independently. It is possible to restrict the SBS/phenotype processing with the following:
- If either of the sample dataframes defined in 0.configure_preprocess_params.ipynb are empty then no samples will be processed. See the notebook for more details.
- By varying the tags in the
4.run_sbs_phenotype
sh files (--until all_sbs
or--until all_phenotype
), the analysis will only run only the analysis of interest.
Run the following script to generate a rulegraph of Brieflow:
sh generate_rulegraph.sh
- Core improvements should be contributed back to Brieflow
- If you have analyzed any of your optical pooled screening data using brieflow-analysis, please reach out and we will include you in the table below!
Study | Description | Analysis Repository | Publication |
---|---|---|---|
Coming soon |