Skip to content

Commit 3c90f6a

Browse files
authored
Merge pull request #51 from drneavin/v3.0.0
V3.0.0
2 parents f2ec981 + 3781d16 commit 3c90f6a

34 files changed

+1697
-306
lines changed

Singularity.Demuxafy

Lines changed: 256 additions & 164 deletions
Large diffs are not rendered by default.

Singularity.Demuxafy.3.0.0_ubuntu2004.def

Lines changed: 427 additions & 0 deletions
Large diffs are not rendered by default.

docs/source/Contact.rst

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -4,10 +4,10 @@
44
Contact
55
=======
66

7-
.. _preprint: https://www.biorxiv.org/content/10.1101/2022.03.07.483367v1
7+
.. _publication: https://genomebiology.biomedcentral.com/articles/10.1186/s13059-024-03224-8
88

99
Demuxafy has been developed by Drew Neavin in Joseph Powell's Lab at the Garvan Institute of Medical Research.
1010

1111
You can contact us with questions, issues or recommendations with a `Github issue <https://github.com/drneavin/Demultiplexing_Doublet_Detecting_Docs/issues>`__.
1212

13-
If you use this resource, please cite our preprint_.
13+
If you use this resource, please cite our publication_.

docs/source/DataPrep.rst

Lines changed: 5 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -77,7 +77,10 @@ For 1000G, use the instructions at the above link to access the data per your pr
7777
Preparing your own SNP Genotype Data
7878
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
7979

80-
It is best to filter the SNP genotypes for common SNPs (generally > 1% or > 5% minor allele frequency) that overlap exons.
80+
It is best to filter the SNP genotypes for common SNPs (generally > 1% or > 5% minor allele frequency) that overlap either exons or genes.
81+
We typically suggest filtering for exons since it typically resultsin in ~250k SNPs to remain following filtering which is sufficient for demultiplexing without using too many SNPs which can slow down the demultiplexing softwares.
82+
However, some capture types might be better suited to look for SNPs overlapping genes such as single nuclei RNA-seq.
83+
For relative numbers of SNPs in the exons and introns, see the `issue raised by @jamesnemesh <https://github.com/drneavin/Demultiplexing_Doublet_Detecting_Docs/issues/49#issue-2182195018>`__.
8184
Here we provide an example of how to do this filtering.
8285
We built the required softwares into the singularity image so you can run these filtering steps with the image.
8386

@@ -237,7 +240,7 @@ You can download the dataset with one of the following commands:
237240

238241
.. code-block:: bash
239242
240-
tar -xvf TestData4PipelineFull.tar.gz
243+
tar -xvf TestData4PipelineSmall.tar.gz
241244
242245
This should unzip the ``TestData4PipelineSmall`` directory where you will have the following file structure:
243246

docs/source/DemultiplexingSoftwares.rst

Lines changed: 4 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -4,7 +4,7 @@ Overview of Demultiplexing Softwares
44

55
Demultiplexing softwares use the inherent genetic differences between donors multiplexed in a single pool to assign droplets to each donor and to identify doublets.
66
There are five demultiplexing softwares that have different capabilities and advantages depending on your dataset.
7-
As you can see from this table, only :ref:`Demuxlet <Demuxlet-docs>` absolutely requires reference SNP genotypes for the donors multiplexed in your pool.
7+
As you can see from this table, only :ref:`Demuxlet <Demuxlet-docs>`, :ref:`Demuxalot <Demuxalot-docs>` and :ref:`Dropulation <Dropulation-docs>` absolutely requires reference SNP genotypes for the donors multiplexed in your pool.
88
However, :ref:`Souporcell <Souporcell-docs>` and :ref:`Vireo <Vireo-docs>` are also capable of accomodating reference SNP genotypes as well.
99

1010
+--------------------------------------+------------------------------------------+------------------------------------------+------------------------------------------+
@@ -26,8 +26,9 @@ However, :ref:`Souporcell <Souporcell-docs>` and :ref:`Vireo <Vireo-docs>` are a
2626
|:ref:`Vireo <Vireo-docs>` | .. centered:: |:heavy_multiplication_x:| | .. centered:: |:heavy_check_mark:| | .. centered:: |:heavy_check_mark:| |
2727
+--------------------------------------+------------------------------------------+------------------------------------------+------------------------------------------+
2828

29-
We highly recommend using :ref:`Souporcell <Souporcell-docs>` if only to estimate the percentage of ambient RNA in your pool.
29+
We highly recommend using :ref:`Souporcell <Souporcell-docs>` or :ref:`Vireo <Vireo-docs>` if only to estimate the percentage of ambient RNA in your pool.
30+
:ref:`Souporcell <Souporcell-docs>` will estimate ambient RNA for the pool as a whole while :ref:`Vireo <Vireo-docs>` will estimate ambient RNA for each cell.
3031
As far as we are aware, this is the only software that leverages SNP genotype data to estimate ambient RNA in multiplexed pools and it is helpful to identify high ambient RNA which is sometimes undetectable with basic QC metrics.
3132
We view this as supplementary to other ambient RNA methods that use the transcriptional profile to estimate and remove ambient RNA per droplet.
3233

33-
If you don't know which demultiplexing software(s) to run, take a look at our :ref:`Software Selection Recommendations <SoftwareSelection-docs>` based on your dataset or use our **add widget link here**
34+
If you don't know which demultiplexing software(s) to run, take a look at our :ref:`Software Selection Recommendations <SoftwareSelection-docs>` based on your dataset or use our `Software Selector and Doublet Estimator Tool <https://demultiplexing-doublet-detecting-docs.readthedocs.io/en/latest/Calculator_final_version.html>`__`

docs/source/Demuxalot.rst

Lines changed: 21 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -5,7 +5,7 @@ Demuxalot
55
===========================
66

77
.. _Demuxalot: https://pypi.org/project/demuxalot/
8-
.. _preprint: https://www.biorxiv.org/content/10.1101/2022.03.07.483367v1
8+
.. _publication: https://genomebiology.biomedcentral.com/articles/10.1186/s13059-024-03224-8
99

1010

1111

@@ -40,6 +40,15 @@ This is the data that you will need to have prepare to run Demuxalot_:
4040

4141
- For example, this is the :download:`individual file <_download_files/Individuals.txt>` for our example dataset
4242

43+
.. admonition:: Optional
44+
45+
- The SAM tag used in the Bam file to annotate the aligned single cell reads with their corresponding cell barcode (``$CELL_TAG``)
46+
47+
- If not specified, _Demuxalot defaults to using ``CB`` as that flag is used by Cell Ranger.
48+
49+
- The SAM tag used in the Bam file to annotate the aligned single cell reads with their corresponding unique molecular identifier (UMI) (``$UMI_TAG``)
50+
51+
- If not specified, _Demuxalot defaults to using ``UB`` as that flag is used by Cell Ranger.
4352

4453

4554
Run Demuxalot
@@ -75,7 +84,7 @@ Demultiplex with Demuxalot
7584

7685
.. tab:: With Refinement
7786

78-
This will run the first phase of Demuxalot_ as well as the subsequent refinement:
87+
This will run the first phase of Demuxalot_ as well as the subsequent refinement, provided an appropriate thread number (``$THREADS``) for your system:
7988

8089
.. code-block:: bash
8190
@@ -85,6 +94,9 @@ Demultiplex with Demuxalot
8594
-n $INDS \
8695
-v $VCF \
8796
-o $DEMUXALOT_OUTDIR \
97+
-p $THREADS \
98+
${CELL_TAG:+-c $CELL_TAG} \
99+
${UMI_TAG:+-u $UMI_TAG} \
88100
-r True
89101
90102
.. admonition:: HELP! It says my file/directory doesn't exist!
@@ -94,6 +106,8 @@ Demultiplex with Demuxalot
94106
This is easy to fix.
95107
The issue and solution are explained in detail in the :ref:`Notes About Singularity Images <Singularity-docs>`
96108

109+
Setting ``$THREADS`` to ``-1`` results in Demuxalot_ using all available CPUs/threads.
110+
97111
If Demuxalot_ is successful, you will have these new files in your ``$DEMUXALOT_OUTDIR``:
98112

99113
.. code-block:: bash
@@ -109,7 +123,7 @@ Demultiplex with Demuxalot
109123
110124
.. tab:: Without Refinement
111125

112-
This will run the first phase of Demuxalot_ only without any refinement:
126+
This will run the first phase of Demuxalot_ only without any refinement, provided an appropriate thread number (``$THREADS``) for your system:
113127

114128
.. code-block:: bash
115129
@@ -119,6 +133,7 @@ Demultiplex with Demuxalot
119133
-n $INDS \
120134
-v $VCF \
121135
-o $DEMUXALOT_OUTDIR \
136+
-p $THREADS \
122137
-r False
123138
124139
.. admonition:: HELP! It says my file/directory doesn't exist!
@@ -128,6 +143,8 @@ Demultiplex with Demuxalot
128143
This is easy to fix.
129144
The issue and solution are explained in detail in the :ref:`Notes About Singularity Images <Singularity-docs>`
130145

146+
Setting ``$THREADS`` to ``-1`` results in Demuxalot_ using all available CPUs/threads.
147+
131148
If Demuxalot_ is successful, you will have these new files in your ``$DEMUXALOT_OUTDIR``:
132149

133150
.. code-block:: bash
@@ -303,6 +320,6 @@ See :ref:`Combine Results <Combine-docs>`.
303320

304321
Citation
305322
--------
306-
If you used the Demuxafy platform for analysis, please reference our preprint_ as well as Demuxalot_.
323+
If you used the Demuxafy platform for analysis, please reference our publication_ as well as Demuxalot_.
307324

308325

docs/source/Demuxlet.rst

Lines changed: 52 additions & 6 deletions
Original file line numberDiff line numberDiff line change
@@ -5,7 +5,7 @@ Demuxlet
55
===========================
66

77
.. _Demuxlet: https://github.com/statgen/popscle
8-
.. _preprint: https://www.biorxiv.org/content/10.1101/2022.03.07.483367v1
8+
.. _publication: https://genomebiology.biomedcentral.com/articles/10.1186/s13059-024-03224-8
99

1010
Demuxlet_ is a genotype demultiplexing software that requires reference genotypes to be available for each individual in the pool.
1111
Therefore, if you don't have reference genotypes, you may want to demultiplex with one of the softwares that do not require reference genotype data
@@ -46,6 +46,14 @@ This is the data that you will need to have prepare to run Demuxlet_:
4646

4747
- For example, this is the :download:`individual file <_download_files/Individuals.txt>` for our example dataset
4848

49+
- The SAM tag used in the Bam file to annotate the aligned single cell reads with their corresponding cell barcode (``$CELL_TAG``)
50+
51+
- If not specified, _Demuxlet defaults to using ``CB``.
52+
53+
- The SAM tag used in the Bam file to annotate the aligned single cell reads with their corresponding unique molecular identifier (UMI) (``$UMI_TAG``)
54+
55+
- If not specified, _Demuxlet defaults to using ``UB``.
56+
4957

5058
Run Demuxlet
5159
------------
@@ -77,6 +85,8 @@ Popscle Pileup
7785

7886
First we will need to identify the number of reads from each allele at each SNP location.
7987

88+
Please note that the ``\`` at the end of each line is purely for readability to put a separate parameter argument on each line.
89+
8090
.. tabs::
8191

8292
.. tab:: With ``$INDS`` file
@@ -85,7 +95,14 @@ First we will need to identify the number of reads from each allele at each SNP
8595

8696
.. code-block:: bash
8797
88-
singularity exec Demuxafy.sif popscle dsc-pileup --sam $BAM --vcf $VCF --group-list $BARCODES --out $DEMUXLET_OUTDIR/pileup --sm-list $INDS
98+
singularity exec Demuxafy.sif popscle_pileup.py \
99+
--sam $BAM \
100+
--vcf $VCF \
101+
--group-list $BARCODES \
102+
--tag-group $CELL_TAG \
103+
--tag-UMI $UMI_TAG \
104+
--out $DEMUXLET_OUTDIR/pileup \
105+
--sm-list $INDS
89106
90107
.. admonition:: HELP! It says my file/directory doesn't exist!
91108
:class: dropdown
@@ -103,7 +120,14 @@ First we will need to identify the number of reads from each allele at each SNP
103120

104121
.. code-block:: bash
105122
106-
singularity exec Demuxafy.sif popscle dsc-pileup --sam $BAM --vcf $VCF --group-list $BARCODES --out $DEMUXLET_OUTDIR/pileup
123+
singularity exec Demuxafy.sif popscle dsc-pileup \
124+
--sam $BAM \
125+
--vcf $VCF \
126+
--group-list $BARCODES \
127+
--tag-UMI $UMI_TAG \
128+
--tag-group $CELL_TAG \
129+
--out $DEMUXLET_OUTDIR/pileup
130+
107131
108132
.. admonition:: HELP! It says my file/directory doesn't exist!
109133
:class: dropdown
@@ -135,6 +159,7 @@ Popscle Demuxlet
135159

136160
Once you have run ``popscle pileup``, you can demultiplex your samples:
137161

162+
Please note that the ``\`` at the end of each line is purely for readability to put a separate parameter argument on each line.
138163

139164
.. tabs::
140165

@@ -144,7 +169,18 @@ Once you have run ``popscle pileup``, you can demultiplex your samples:
144169

145170
.. code-block:: bash
146171
147-
singularity exec Demuxafy.sif popscle demuxlet --plp $DEMUXLET_OUTDIR/pileup --vcf $VCF --field $FIELD --group-list $BARCODES --geno-error-coeff 1.0 --geno-error-offset 0.05 --out $DEMUXLET_OUTDIR/demuxlet --sm-list $INDS
172+
singularity exec Demuxafy.sif popscle demuxlet \
173+
--plp $DEMUXLET_OUTDIR/pileup \
174+
--vcf $VCF \
175+
--field $FIELD \
176+
--group-list $BARCODES \
177+
--tag-group $CELL_TAG \
178+
--tag-UMI $UMI_TAG \
179+
--geno-error-coeff 1.0 \
180+
--geno-error-offset 0.05 \
181+
--out $DEMUXLET_OUTDIR/demuxlet \
182+
--sm-list $INDS
183+
148184
149185
.. admonition:: HELP! It says my file/directory doesn't exist!
150186
:class: dropdown
@@ -161,7 +197,17 @@ Once you have run ``popscle pileup``, you can demultiplex your samples:
161197

162198
.. code-block:: bash
163199
164-
singularity exec Demuxafy.sif popscle demuxlet --plp $DEMUXLET_OUTDIR/pileup --vcf $VCF --field $FIELD --group-list $BARCODES --geno-error-coeff 1.0 --geno-error-offset 0.05 --out $DEMUXLET_OUTDIR/demuxlet
200+
singularity exec Demuxafy.sif popscle demuxlet \
201+
--plp $DEMUXLET_OUTDIR/pileup \
202+
--vcf $VCF \
203+
--field $FIELD \
204+
--group-list $BARCODES \
205+
--tag-group $CELL_TAG \
206+
--tag-UMI $UMI_TAG \
207+
--geno-error-coeff 1.0 \
208+
--geno-error-offset 0.05 \
209+
--out $DEMUXLET_OUTDIR/demuxlet
210+
165211
166212
.. admonition:: HELP! It says my file/directory doesn't exist!
167213
:class: dropdown
@@ -285,7 +331,7 @@ See :ref:`Combine Results <Combine-docs>`.
285331

286332
Citation
287333
--------
288-
If you used the Demuxafy platform for analysis, please reference our preprint_ as well as `Demuxlet <https://www.nature.com/articles/nbt.4042>`__.
334+
If you used the Demuxafy platform for analysis, please reference our publication_ as well as `Demuxlet <https://www.nature.com/articles/nbt.4042>`__.
289335

290336

291337

docs/source/DoubletDecon.rst

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -4,7 +4,7 @@ DoubletDecon
44
===========================
55

66
.. _DoubletDecon: https://github.com/EDePasquale/DoubletDecon
7-
.. _preprint: https://www.biorxiv.org/content/10.1101/2022.03.07.483367v1
7+
.. _publication: https://genomebiology.biomedcentral.com/articles/10.1186/s13059-024-03224-8
88

99
DoubletDecon_ is a transcription-based doublet detection software that uses deconvolution to identify doublets using the `R` statistical software.
1010
We have provided a wrapper script that takes common arguments for DoubletDecon_ and also provide example code for you to run manually if you prefer.
@@ -271,4 +271,4 @@ See :ref:`Combine Results <Combine-docs>`.
271271

272272
Citation
273273
--------
274-
If you used the Demuxafy platform for analysis, please reference our preprint_ as well as `DoubletDecon <https://www.sciencedirect.com/science/article/pii/S2211124719312860>`__.
274+
If you used the Demuxafy platform for analysis, please reference our publication_ as well as `DoubletDecon <https://www.sciencedirect.com/science/article/pii/S2211124719312860>`__.

docs/source/DoubletDetectingSoftwares.rst

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -21,7 +21,7 @@ The table bellow provides a comparison of the different methods.
2121
+--------------------------------------------------+------------------------------------------+------------------------------------------+-----------------------------------------------------------------------------------------------------------------------------------------------------------+
2222
| :ref:`scDblFinder <scDblFinder-docs>` | .. centered:: |:heavy_multiplication_x:| | .. centered:: |:heavy_multiplication_x:| | Gradient boosted trees trained with number neighboring doublets and QC metrics to classify doublets |
2323
+--------------------------------------------------+------------------------------------------+------------------------------------------+-----------------------------------------------------------------------------------------------------------------------------------------------------------+
24-
| :ref:`Scds <Scds-docs>` | .. centered:: |:heavy_multiplication_x:| | .. centered:: |:heavy_multiplication_x:| | **cxds**: Uses genes pairs that are typically not expressed in the same droplet to rank droplets based on co-expression of all pairs. |br| |
24+
| :ref:`Scds <Scds-docs>` | .. centered:: |:heavy_multiplication_x:| | .. centered:: |:heavy_multiplication_x:| | **cxds**: Uses genes pairs that are typically not expressed in the same droplet to rank droplets based on co-expression of all pairs. |br| |
2525
| | | | **bcds**: Uses highly variable genes and simulated doublets to train a binary classification algorithm and return probability of droplet being a doublet. |
2626
+--------------------------------------------------+------------------------------------------+------------------------------------------+-----------------------------------------------------------------------------------------------------------------------------------------------------------+
2727
| :ref:`Scrublet <Scrublet-docs>` | .. centered:: |:heavy_multiplication_x:| | .. centered:: |:heavy_multiplication_x:| | Identifies the number of neighboring simulated doublets for each droplet and uses bimodal distribution of scores to classify singlets and doublets. |

docs/source/DoubletDetection.rst

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -4,7 +4,7 @@ DoubletDetection
44
===========================
55

66
.. _DoubletDetection: https://github.com/JonathanShor/DoubletDetection
7-
.. _preprint: https://www.biorxiv.org/content/10.1101/2022.03.07.483367v1
7+
.. _publication: https://genomebiology.biomedcentral.com/articles/10.1186/s13059-024-03224-8
88

99

1010
DoubletDetection_ is a transcription-based doublet detection software.
@@ -405,4 +405,4 @@ See :ref:`Combine Results <Combine-docs>`.
405405
406406
Citation
407407
--------
408-
If you used the Demuxafy platform for analysis, please reference our preprint_ as well as `DoubletDetection <https://zenodo.org/record/4359992>`__.
408+
If you used the Demuxafy platform for analysis, please reference our publication_ as well as `DoubletDetection <https://zenodo.org/record/4359992>`__.

0 commit comments

Comments
 (0)