forked from ablab/spades
-
Notifications
You must be signed in to change notification settings - Fork 0
/
Copy pathrnaspades_manual.html
139 lines (107 loc) · 8.42 KB
/
rnaspades_manual.html
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
<html>
<head>
<title>rnaSPAdes manual</title>
<style type="text/css">
.code {
background-color: lightgray;
}
</style>
</head>
<body>
<h1>rnaSPAdes manual</h1>
1. <a href="#sec1">About rnaSPAdes</a><br>
2. <a href="#sec2">rnaSPAdes specifics</a><br>
2.1. <a href="#sec2.1">Running rnaSPAdes</a><br>
2.2. <a href="#sec2.2">RNA-specific options</a><br>
2.3. <a href="#sec2.3">rnaSPAdes output</a><br>
3. <a href="#sec3">Assembly evaluation</a><br>
4. <a href="#sec4">Citation</a><br>
5. <a href="#sec5">Feedback and bug reports</a><br>
<a name="sec1"></a>
<h2>1 About rnaSPAdes</h2>
<p> rnaSPAdes is a tool for <i>de novo</i> transcriptome assembly from RNA-Seq data and is suitable for all kind of organisms. rnaSPAdes is a part of <a href="http://cab.spbu.ru/software/spades/" target="_blank">SPAdes package</a> since version 3.9. Information about SPAdes download, requirements, installation and basic options can be found in <a href="manual.html" target="_blank">SPAdes manual</a>. Below you may find information about differences between SPAdes and rnaSPAdes.
<a name="sec2"></a>
<h2>2 rnaSPAdes specifics</h2>
<a name="sec2.1"></a>
<h3>2.1 Running rnaSPAdes</h3>
<p>
To run rnaSPAdes use
<pre class="code">
<code>
rnaspades.py [options] -o <output_dir>
</code>
</pre>
or
<pre class="code">
<code>
spades.py --rna [options] -o <output_dir>
</code>
</pre>
Note that we assume that SPAdes installation directory is added to the <code>PATH</code> variable (provide full path to rnaSPAdes executable otherwise: <code><rnaspades installation dir>/rnaspades.py</code>).
<p>Here are several notes regarding rnaSPAdes options:
<ul>
<li>rnaSPAdes take as an input at least one paired-end or single-end library (see note below). For hybrid assembly you can use PacBio or Oxford Nanopore reads.</li>
<li>rnaSPAdes does not support <code>--careful</code> and <code>--cov-cutoff</code> options.</li>
<li>rnaSPAdes is not compatible with other pipeline options such as <code>--meta</code>, <code>--sc</code> and <code>--plasmid</code>. If you wish to assemble metatranscriptomic data just run rnaSPAdes as it is.</li>
<li>By default rnaSPAdes uses 2 k-mer sizes, which are automatically detected using read length (approximately one third and half of the maximal read length). We recommend not to change this parameter because smaller k-mer sizes typically result in multiple chimeric (misassembled) transcripts. In case you have any doubts about your run, do not hesitate to contact us using e-mail given below.</li>
<li>Although rnaSPAdes supports IonTorrent reads, it was not sufficiently tested on such kind of data.</li>
</ul>
<b>Assembling multiple RNA-Seq libraries</b>
<p> In case you have sequenced several RNA-Seq libraries using the same protocol from different tissues / conditions, and the goal as to assemble a total transcriptome, we suggest to provide all files as a single library (see <a href="manual.html#inputdata" target="_blank">main manual</a> to check input options). Note, that sequencing using the same protocol implies that the resulting reads have the same length, insert size and strand-specificity. Transcript quantification for each sample can be done afterwards by separately mapping reads from each library to the assembled transcripts.
<p> When assembling multiple strand-specific libraries, only the first one will be used to determine strand of each transcript. Thus, we suggest not to mix data with different strand-specificity.
<p> In case you have any questions about running rnaSPAdes, do not hesitate to ask us at via <a href="https://github.com/ablab/spades/issues" target="_blank">GitHub repository tracker</a> or send us an e-mail to: <a href="mailto:[email protected]" target="_blank">[email protected]</a>
<a name="sec2.2"></a>
<h3>2.2 RNA-specific options</h3>
<b>Assembling strand-specific data</b>
<p>rnaSPAdes supports strand-specific RNA-Seq datasets. You can set strand-specific type using the following option:
<p>
<code>--ss <type></code><br>
Use <code><type> = rf</code> when first read in pair corresponds to <b>reverse</b> gene strand
(antisense data, e.g. obtained via dUTP protocol) and <code><type> = fr</code> otherwise (forward).
Older deprecated syntax is <code>--ss-rf</code> and <code>--ss-fr</code>.
</p>
<p>
Note, that strand-specificity is not related and should not be confused with FR and RF orientation of paired reads. RNA-Seq paired-end reads typically have forward-reverse orientation (--> <--), which is assumed by default and no additional options are needed (see <a href="manual.html#inputdata" target="_blank">main manual</a> for details).
<p>
If the data set is single-end use <code>--ss rf</code> option in case when reads are antisense and <code>--ss fr</code> otherwise.
<br>
<p><b>Hybrid transcriptome assembly</b>
<p>rnaSPAdes now supports conventional <code>--pacbio</code> and <code>--nanopore</code> options (see <a href="manual.html" target="_blank">SPAdes manual</a>). Moreover, in addition to
long reads you may also provide a separate file with reads capturing the entire transcript sequences using the following options. Full-length transcripts in such reads can be
typically detected using the adapters. Note, that FL reads should be trimmed so that the adapters are excluded.
<p>
<code>--fl-rna <file_name> </code><br>
File with PacBio/Nanopore/contigs that capture full-length transcripts.
</p>
<a name="sec2.3"></a>
<h3>2.3 rnaSPAdes output</h3>
<p>
rnaSPAdes outputs one main FASTA file named <code>transcripts.fasta</code>. The corresponding file with paths in the <code>assembly_graph.fastg</code> is <code>transcripts.paths</code>.
<p>
In addition rnaSPAdes outputs transcripts with different level of filtration into <code><output_dir>/</code>: <br>
<ul>
<li><code>hard_filtered_transcripts.fasta</code> – includes only long and reliable transcripts with rather high expression.</li>
<li><code>soft_filtered_transcripts.fasta</code> – includes short and low-expressed transcipts, likely to contain junk sequences.</li>
</ul>
We reccomend to use main <code>transcripts.fasta</code> file in case you don't have any specific needs for you projects. Do not hesitate to contact us using e-mail given below.
<p>
Contigs/scaffolds names in rnaSPAdes output FASTA files have the following format: <br><code>>NODE_97_length_6237_cov_11.9819_g8_i2</code><br> Similarly to SPAdes, <code>97</code> is the number of the transcript, <code>6237</code> is its sequence length in nucleotides and <code>11.9819</code> is the k-mer coverage. Note that the k-mer coverage is always lower than the read (per-base) coverage. <code>g8_i2</code> correspond to the gene number 8 and isoform number 2 within this gene. Transcripts with the same gene number are presumably received from same or somewhat similar (e.g. paralogous) genes. Note, that the prediction is based on the presence of shared sequences in the transcripts and is very approximate.
<a name="sec3">
<h2>3 Assembly evaluation</h2>
<p>
<a href="http://cab.spbu.ru/software/rnaquast/" target="_blank">rnaQUAST</a> may be used for transcriptome assembly quality assessment for model organisms when reference genome and gene database are available. rnaQUAST also includes <a href="http://busco.ezlab.org/" target="_blank">BUSCO</a> and <a href="http://topaz.gatech.edu/GeneMark/" target="_blank"> GeneMarkS-T</a> tools for <i>de novo</i> evaluation.
<br>
<a name="sec4">
<h2>4 Citation</h2>
<p>
If you use rnaSPAdes in your research, please include <a href="https://academic.oup.com/gigascience/article/8/9/giz100/5559527" target="_blank"> Bushmanova et al., 2019</a> in your reference list.
<a name="sec5">
<h2>5 Feedback and bug reports</h2>
<p>
Your comments, bug reports, and suggestions are very welcomed. They will help us to further improve rnaSPAdes.
If you have any troubles running rnaSPAdes, please send us <code>params.txt</code> and <code>spades.log</code> from the directory <code><output_dir></code>.
<p>
You can leave your comments and bug reports at <a href="https://github.com/ablab/spades/issues" target="_blank">our GitHub repository tracker</a> or send it via e-mail: <a href="mailto:[email protected]" target="_blank">[email protected]</a>.
<br/><br/><br/><br/><br/>
</body>
</html>