You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Copy file name to clipboardExpand all lines: README.md
+38-11Lines changed: 38 additions & 11 deletions
Original file line number
Diff line number
Diff line change
@@ -1,7 +1,7 @@
1
1
[](http://bioconda.github.io/recipes/sctagger/README.html)
2
2
3
3
# scTagger
4
-
scTagger matches barcodes of short- and long-reads of single-cell RNA-seq experiments to achieve the information of both datasets.
4
+
scTagger matches barcodes of short- and long-reads of single-cell RNA-seq experiments to enable relating at the cell level gene expression (from short-reads) and RNA splicing (from the long-reads).
5
5
6
6
## Installation
7
7
@@ -23,7 +23,7 @@ scTagger has a single python script containing different functions to match long
23
23
24
24
The whole pipeline contains three steps that you can run each part separately:
25
25
26
-
#### Extract long-reads segment
26
+
#### *1) Extract long-reads segment*
27
27
The first step of the scTagger pipeline is to extract a segment where the probability of seeing a barcode is more than in other places.
28
28
To run this step, you can use the following command.
29
29
@@ -37,23 +37,23 @@ To run this step, you can use the following command.
37
37
*`-g`: Space separated of the ranges of where SR adapter should be found on the LR's (Optional, Default: Detect from data)
38
38
*`-z`: Indicate input is gzipped (Optional, Default: Assume input is gzipped if it ends with \".gz\")
*`--num-bp-afte`: Number of bases after the end of the SR adapter alignment to generate (Optional, Default: 20)
42
42
*`-o`: Path to output file
43
43
*`-p`: Path to plot file (Optional, Default: No plotting)
44
44
45
45
**Inputs**
46
-
* A list of fastQ files of longreads
46
+
* A list of FASTQ files of long-reads
47
47
48
48
**Outputs**
49
49
* A Tsv file:
50
50
* First column is read-id
51
51
* Second column is the best edit distance with the short-read adapter
52
52
* Third column is the starting point of long-read that matches with the adapter
53
53
* Fourth column is the long-read segment that find.
54
-
* A plot of optimal alignment locations of the short read adapter to the longreads.
54
+
* A plot of optimal alignment locations of the short read adapter to the long-reads.
55
55
56
-
#### Extract short-reads barcodes
56
+
#### *2) Extract short-reads barcodes*
57
57
58
58
The second step is to extract the top short-reads barcodes that cover most of the reads.
59
59
@@ -78,8 +78,35 @@ The second step is to extract the top short-reads barcodes that cover most of th
78
78
* Second column is the number of appearances of the barcode
79
79
* A cumulative plot of SR coverage with batches of 1,000 barcodes
80
80
81
-
#### Match long-reads segment with short-reads barcode
82
-
The last step is to match long read segments with selected barcodes from short reads
81
+
#### *Alt. 2) Extract short-reads barcodes directly from long-reads*
82
+
83
+
This is an alternative to the second step which avoids using the short-reads all together and inteads builds a whiltelist of cellular barcodes from the long-reads directly.
84
+
This is done by looking for exact matches of the 10x Chromium list of cellular barcodes on the long-read barcode segments.
85
+
The barcodes are sorted by frequency and the most frequent barcodes are kept using the strategy as the `extract_sr_bc` module.
0 commit comments