-
Notifications
You must be signed in to change notification settings - Fork 1
/
using-ggd.html
411 lines (303 loc) · 21.2 KB
/
using-ggd.html
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
373
374
375
376
377
378
379
380
381
382
383
384
385
386
387
388
389
390
391
392
393
394
395
396
397
398
399
400
401
402
403
404
405
406
407
408
409
410
411
<!DOCTYPE html>
<html>
<head>
<meta charset="utf-8" />
<meta name="viewport" content="width=device-width, initial-scale=1.0" />
<title>Using GGD — GGD documentation</title>
<link rel="stylesheet" href="_static/pygments.css" type="text/css" />
<link rel="stylesheet" href="_static/alabaster.css" type="text/css" />
<link rel="stylesheet" type="text/css" href="_static/style.css" />
<link rel="stylesheet" type="text/css" href="_static/font-awesome-4.7.0/css/font-awesome.min.css" />
<script id="documentation_options" data-url_root="./" src="_static/documentation_options.js"></script>
<script src="_static/jquery.js"></script>
<script src="_static/underscore.js"></script>
<script src="_static/doctools.js"></script>
<link rel="index" title="Index" href="genindex.html" />
<link rel="search" title="Search" href="search.html" />
<link rel="next" title="GGD Commands" href="GGD-CLI.html" />
<link rel="prev" title="GGD Quick Start" href="quick-start.html" />
<link href="https://fonts.googleapis.com/css?family=Lato|Raleway" rel="stylesheet">
<link href="https://fonts.googleapis.com/css?family=Inconsolata" rel="stylesheet">
<meta name="msapplication-TileColor" content="#ffffff">
<meta name="msapplication-TileImage" content="_static/ms-icon-144x144.png">
<link rel="stylesheet" href="https://cdnjs.cloudflare.com/ajax/libs/selectize.js/0.12.6/css/selectize.bootstrap3.min.css">
<link rel="stylesheet" href="https://cdnjs.cloudflare.com/ajax/libs/twitter-bootstrap/4.3.1/css/bootstrap.min.css">
<script src="https://cdnjs.cloudflare.com/ajax/libs/datatables/1.10.21/js/jquery.dataTables.min.js"></script>
<script src="https://cdnjs.cloudflare.com/ajax/libs/selectize.js/0.12.6/js/standalone/selectize.min.js"></script>
<script src="https://cdnjs.cloudflare.com/ajax/libs/twitter-bootstrap/4.3.1/js/bootstrap.bundle.min.js"></script>
</head><body>
<div class="document">
<div class="sphinxsidebar" role="navigation" aria-label="main navigation">
<div class="sphinxsidebarwrapper">
<p class="logo">
<a href="index.html">
<img class="logo" src="_static/logo/GoGetData_name_logo.png" alt="Logo"/>
</a>
</p>
<h3>Navigation</h3>
<ul class="current">
<li class="toctree-l1"><a class="reference internal" href="quick-start.html">GGD Quick Start</a></li>
<li class="toctree-l1 current"><a class="current reference internal" href="#">Using GGD</a><ul>
<li class="toctree-l2"><a class="reference internal" href="#install-conda">1. Install conda</a></li>
<li class="toctree-l2"><a class="reference internal" href="#configure-the-conda-channels">2. Configure the conda channels</a></li>
<li class="toctree-l2"><a class="reference internal" href="#install-ggd">3. Install ggd</a></li>
<li class="toctree-l2"><a class="reference internal" href="#ggd-tools">4. ggd tools</a></li>
<li class="toctree-l2"><a class="reference internal" href="#contributing-to-ggd">5. Contributing to ggd</a></li>
<li class="toctree-l2"><a class="reference internal" href="#ggd-use-case">ggd Use Case</a></li>
</ul>
</li>
<li class="toctree-l1"><a class="reference internal" href="GGD-CLI.html">GGD Commands</a></li>
<li class="toctree-l1"><a class="reference internal" href="meta-recipes.html">GGD meta-recipes</a></li>
<li class="toctree-l1"><a class="reference internal" href="contribute.html">Contribute</a></li>
<li class="toctree-l1"><a class="reference internal" href="private_recipes.html">Private Recipes</a></li>
<li class="toctree-l1"><a class="reference internal" href="workflows.html">Using GGD in Workflows</a></li>
<li class="toctree-l1"><a class="reference internal" href="recipes.html">Available Data Packages</a></li>
</ul>
<ul>
<li class="toctree-l1"><a href="https://github.com/gogetdata/ggd-recipes">ggd-recipes @ Github</a></li>
<li class="toctree-l1"><a href="https://github.com/gogetdata/ggd-cli">ggd-cli @ Github</a></li>
</ul>
<div id="searchbox" style="display: none" role="search">
<h3 id="searchlabel">Quick search</h3>
<div class="searchformwrapper">
<form class="search" action="search.html" method="get">
<input type="text" name="q" aria-labelledby="searchlabel" />
<input type="submit" value="Go" />
</form>
</div>
</div>
<script>$('#searchbox').show(0);</script>
</div>
</div>
<div class="documentwrapper">
<div class="bodywrapper">
<div class="body" role="main">
<div class="section" id="using-ggd">
<span id="id1"></span><h1>Using GGD<a class="headerlink" href="#using-ggd" title="Permalink to this headline">¶</a></h1>
<p>[<a class="reference internal" href="index.html#home-page"><span class="std std-ref">Click here to return to the home page</span></a>]</p>
<p><strong>To see and/or search for data packages available through GGD, see:</strong> <a class="reference internal" href="recipes.html#recipes"><span class="std std-ref">Available data packages</span></a></p>
<p><strong>For a brief introduction to how ggd works and to start using ggd see:</strong> <a class="reference internal" href="quick-start.html#quick-start"><span class="std std-ref">GGD Quick Start</span></a></p>
<p><strong>To request a new data recipe please fill out the</strong> <a class="reference external" href="https://forms.gle/3WEWgGGeh7ohAjcJA">GGD Recipe Request</a> <strong>Form.</strong></p>
<div class="admonition important">
<p class="admonition-title">Important</p>
<p>If you use GGD, please cite the <a class="reference external" href="https://www.nature.com/articles/s41467-021-22381-z">Nature Communications GGD paper</a></p>
</div>
<div class="section" id="install-conda">
<h2>1. Install conda<a class="headerlink" href="#install-conda" title="Permalink to this headline">¶</a></h2>
<p>ggd requires the conda package management system be installed on your system. Loading conda from a module
is not sufficient as data packages are stored in conda root. Please install Anaconda or Miniconda onto your system.
The best way to install is with the <a class="reference external" href="http://conda.pydata.org/miniconda.html">Miniconda</a>
package. We specifically recommend using the Python 3 version.</p>
<div class="admonition warning">
<p class="admonition-title">Warning</p>
<p>After December 31, 2020 GGD will no longer maintain python 2 compatibility. Python 2 may still work, but maintenance will
be focused on python 3. This decision is based on the End-Of-Life of python 2 starting on January 1, 2020. GGD will maintain
python 2 compatibility for 1 year from the End-Of-Life of python 2.</p>
</div>
</div>
<div class="section" id="configure-the-conda-channels">
<h2>2. Configure the conda channels<a class="headerlink" href="#configure-the-conda-channels" title="Permalink to this headline">¶</a></h2>
<p>ggd data packages are stored in the Anaconda cloud. Additionally, ggd uses software tools available from
other software packages in conda. A ggd conda channel, and other required channels, need to be added to your conda
configurations. You can add as many available ggd channels as you would like, but only one of the available
ggd channels is required. As ggd becomes more widely used, additional channels will be created to support different areas of
research.</p>
<p>Available ggd channels:</p>
<ul class="simple">
<li><p>ggd-genomics</p></li>
</ul>
<p>Run the following commands, adding in additional ggd channels as desired:</p>
<div class="highlight-bash notranslate"><div class="highlight"><pre><span></span>$ conda config --add channels defaults
$ conda config --add channels ggd-genomics
$ conda config --add channels bioconda
$ conda config --add channels conda-forge
</pre></div>
</div>
</div>
<div class="section" id="install-ggd">
<h2>3. Install ggd<a class="headerlink" href="#install-ggd" title="Permalink to this headline">¶</a></h2>
<div class="admonition note">
<p class="admonition-title">Note</p>
<p>Step 2 above is required prior to installing ggd. If Step 2 has not been completed ggd installation will fail</p>
</div>
<p>ggd needs to be installed on your system before you can use it. Run the following commands to download the
ggd cli:</p>
<div class="highlight-bash notranslate"><div class="highlight"><pre><span></span>$ conda install -c bioconda ggd
</pre></div>
</div>
</div>
<div class="section" id="ggd-tools">
<h2>4. ggd tools<a class="headerlink" href="#ggd-tools" title="Permalink to this headline">¶</a></h2>
<p>The ggd command line tool (cli) installed in step 3 has built-in tools for accessing and managing
data packages. These tools include:</p>
<ul class="simple">
<li><p><code class="code docutils literal notranslate"><span class="pre">$</span> <span class="pre">ggd</span> <span class="pre">search</span></code>: Search for a ggd data package</p></li>
<li><p><code class="code docutils literal notranslate"><span class="pre">$</span> <span class="pre">ggd</span> <span class="pre">predict-path</span></code>: Predict the file path of a data package that has not been installed yet (Good for workflows like Snakemake)</p></li>
<li><p><code class="code docutils literal notranslate"><span class="pre">$</span> <span class="pre">ggd</span> <span class="pre">install</span></code>: Install ggd data package(s)</p></li>
<li><p><code class="code docutils literal notranslate"><span class="pre">$</span> <span class="pre">ggd</span> <span class="pre">uninstall</span></code>: Uninstall a ggd data package(s)</p></li>
<li><p><code class="code docutils literal notranslate"><span class="pre">$</span> <span class="pre">ggd</span> <span class="pre">list</span></code>: List the installed data packages</p></li>
<li><p><code class="code docutils literal notranslate"><span class="pre">$</span> <span class="pre">ggd</span> <span class="pre">get-files</span></code>: get the files for an installed ggd package</p></li>
<li><p><code class="code docutils literal notranslate"><span class="pre">$</span> <span class="pre">ggd</span> <span class="pre">pkg-info</span></code>: Show a specific ggd package’s info</p></li>
<li><p><code class="code docutils literal notranslate"><span class="pre">$</span> <span class="pre">ggd</span> <span class="pre">show-env</span></code>: Show the ggd specific environment variables</p></li>
<li><p><code class="code docutils literal notranslate"><span class="pre">$</span> <span class="pre">ggd</span> <span class="pre">make-recipe</span></code>: Create a ggd recipe from a bash script</p></li>
<li><p><code class="code docutils literal notranslate"><span class="pre">$</span> <span class="pre">ggd</span> <span class="pre">make-meta-recipe</span></code>: Create a ggd meta-recipe</p></li>
<li><p><code class="code docutils literal notranslate"><span class="pre">$</span> <span class="pre">ggd</span> <span class="pre">check-recipe</span></code>: Check/test a ggd recipe</p></li>
</ul>
<p>For information about specific tools see: <a class="reference internal" href="GGD-CLI.html#ggd-cli-page"><span class="std std-ref">GGD-CLI</span></a></p>
</div>
<div class="section" id="contributing-to-ggd">
<h2>5. Contributing to ggd<a class="headerlink" href="#contributing-to-ggd" title="Permalink to this headline">¶</a></h2>
<p>We intend for ggd to become a widely used data management system for genomics and other research areas.
ggd provides support for reproducibility through conda’s naming, version tracking, and dependency handling structure.
One major function of the ggd cli tools is to provide an easy way to add data packages to the data repository.</p>
<p>We welcome and encourage everyone to contribute to the data repository hosted by ggd.</p>
<p>Instructions on how to create a data package and add it to ggd can be found on the <a class="reference internal" href="contribute.html#make-data-packages"><span class="std std-ref">Contribute</span></a>
documentation pages.</p>
</div>
<div class="section" id="ggd-use-case">
<h2>ggd Use Case<a class="headerlink" href="#ggd-use-case" title="Permalink to this headline">¶</a></h2>
<p>You need to align some sequence(s) to the human reference genome for a given analysis.
You will need to find and download the correct reference genome from one of the sites that hosts it and make sure it is
the correct genome build. You will then need to sort and index the reference genome before you can use it.</p>
<p>ggd simplifies this process by allowing you to search
and install available processed genomic data packages using the ggd tool.</p>
<ol class="arabic simple">
<li><p>Search for a reference genome</p></li>
</ol>
<div class="highlight-bash notranslate"><div class="highlight"><pre><span></span>$ ggd search reference genome
----------------------------------------------------------------------------------------------------
grch37-reference-genome-ensembl-v1
<span class="o">==================================</span>
Summary: The GRCh37 unmasked genomic DNA seqeunce reference genome from Ensembl-Release <span class="m">75</span>. Includes all sequence regions EXCLUDING haplotypes and patches. <span class="s1">'Primary Assembly file'</span>
Species: Homo_sapiens
Genome Build: GRCh37
Keywords: Primary-Assembly, Release-75, ref, reference, Ensembl-ref, DNA-Seqeunce, Fasta-Seqeunce, fasta-file
Data Provider: Ensembl
Data Version: release-75_2-3-14
File type<span class="o">(</span>s<span class="o">)</span>: fa
Data file coordinate base: NA
Included Data Files:
grch37-reference-genome-ensembl-v1.fa
grch37-reference-genome-ensembl-v1.fa.fai
Approximate Data File Sizes:
grch37-reference-genome-ensembl-v1.fa: <span class="m">3</span>.15G
grch37-reference-genome-ensembl-v1.fa.fai: <span class="m">2</span>.74K
To install run:
ggd install grch37-reference-genome-ensembl-v1
----------------------------------------------------------------------------------------------------
grch38-reference-genome-ensembl-v1
<span class="o">==================================</span>
Summary: The GRCh38 unmasked genomic DNA sequence reference genome from Ensembl-Release <span class="m">99</span>. Includes all sequence regions EXCLUDING haplotypes and patches. <span class="s1">'Primary Assembly file'</span>
Species: Homo_sapiens
Genome Build: GRCh38
Keywords: Primary-Assembly, Release-99, ref, reference, Ensembl-ref, DNA-Sequence, Fasta-Sequence, fasta-file
Data Provider: Ensembl
Data Version: release-99_11-18-19
File type<span class="o">(</span>s<span class="o">)</span>: fa
Data file coordinate base: NA
Included Data Files:
grch38-reference-genome-ensembl-v1.fa
grch38-reference-genome-ensembl-v1.fa.fai
Approximate Data File Sizes:
grch38-reference-genome-ensembl-v1.fa: <span class="m">3</span>.15G
grch38-reference-genome-ensembl-v1.fa.fai: <span class="m">6</span>.41K
To install run:
ggd install grch38-reference-genome-ensembl-v1
----------------------------------------------------------------------------------------------------
. . .
</pre></div>
</div>
<ol class="arabic simple" start="2">
<li><p>Install the grch38 reference genome</p></li>
</ol>
<div class="highlight-bash notranslate"><div class="highlight"><pre><span></span>$ ggd install grch38-reference-genome-ensembl-v1
:ggd:install: Looking <span class="k">for</span> grch38-reference-genome-ensembl-v1 in the <span class="s1">'ggd-genomics'</span> channel
:ggd:install: grch38-reference-genome-ensembl-v1 exists in the ggd-genomics channel
:ggd:install: grch38-reference-genome-ensembl-v1 version <span class="m">1</span> is not installed on your system
:ggd:install: grch38-reference-genome-ensembl-v1 has not been installed by conda
:ggd:install: The grch38-reference-genome-ensembl-v1 package is uploaded to an aws S3 bucket. To reduce processing <span class="nb">time</span> the package will be downloaded from an aws S3 bucket
:ggd:install: Attempting to install the following cached package<span class="o">(</span>s<span class="o">)</span>:
grch38-reference-genome-ensembl-v1
:ggd:utils:bypass: Installing grch38-reference-genome-ensembl-v1 from the ggd-genomics conda channel
Collecting package metadata: <span class="k">done</span>
Processing data: <span class="k">done</span>
<span class="c1">## Package Plan ##</span>
environment location: <conda-root>
added / updated specs:
- grch38-reference-genome-ensembl-v1
The following packages will be downloaded:
package <span class="p">|</span> build
---------------------------<span class="p">|</span>-----------------
grch38-reference-genome-ensembl-v1-1<span class="p">|</span> <span class="m">3</span> <span class="m">7</span> KB ggd-genomics
------------------------------------------------------------
Total: <span class="m">7</span> KB
The following NEW packages will be INSTALLED:
grch38-reference-~ ggd-genomics/noarch::grch38-reference-genome-ensembl-v1-1-0
Downloading and Extracting Packages
grch38-reference-gen <span class="p">|</span> <span class="m">7</span> KB <span class="p">|</span> <span class="c1">############################################################################################################################################## | 100%</span>
Preparing transaction: <span class="k">done</span>
Verifying transaction: <span class="k">done</span>
Executing transaction: <span class="k">done</span>
:ggd:install: Updating installed package list
:ggd:install: Initiating data file content validation using checksum
:ggd:install: Checksum <span class="k">for</span> grch38-reference-genome-ensembl-v1
:ggd:checksum: installed file checksum: grch38-reference-genome-ensembl-v1.fa.fai checksum: d527f3eb6b664020cf4d882b5820056f
:ggd:checksum: metadata checksum record: grch38-reference-genome-ensembl-v1.fa.fai checksum: d527f3eb6b664020cf4d882b5820056f
:ggd:checksum: installed file checksum: grch38-reference-genome-ensembl-v1.fa checksum: 9e6b9465dc708d92bf6d67e9c9fa9389
:ggd:checksum: metadata checksum record: grch38-reference-genome-ensembl-v1.fa checksum: 9e6b9465dc708d92bf6d67e9c9fa9389
:ggd:install: ** Successful Checksum **
:ggd:install: Install Complete
:ggd:install: Installed file <span class="nv">locations</span>
<span class="o">======================================================================================================================</span>
GGD Package Environment Variable<span class="o">(</span>s<span class="o">)</span>
----------------------------------------------------------------------------------------------------
-> grch38-reference-genome-ensembl-v1 <span class="nv">$ggd_grch38_reference_genome_ensembl_v1_dir</span>
<span class="nv">$ggd_grch38_reference_genome_ensembl_v1_file</span>
Install Path: <conda-root>/share/ggd/Homo_sapiens/GRCh38/grch38-reference-genome-ensembl-v1/1
----------------------------------------------------------------------------------------------------
:ggd:install: To activate environment variables run <span class="sb">`</span><span class="nb">source</span> activate base<span class="sb">`</span> in the environmnet the packages were installed in
:ggd:install: NOTE: These environment variables are specific to the <conda-root> conda environment and can only be accessed from within that <span class="nv">environmnet</span>
<span class="o">======================================================================================================================</span>
:ggd:install: Environment Variables
*****************************
Inactive or out-of-date environment variables:
> <span class="nv">$ggd_grch38_reference_genome_ensembl_v1_dir</span>
> <span class="nv">$ggd_grch38_reference_genome_ensembl_v1_file</span>
To activate inactive or out-of-date vars, run:
<span class="nb">source</span> activate base
*****************************
</pre></div>
</div>
<ol class="arabic simple" start="3">
<li><p>Identify the data environment variable or the file location</p></li>
</ol>
<div class="highlight-bash notranslate"><div class="highlight"><pre><span></span>$ ggd show-env
***************************
Active environment variables:
> <span class="nv">$ggd_grch38_reference_genome_ensembl_v1_dir</span>
> <span class="nv">$ggd_grch38_reference_genome_ensembl_v1_file</span>
***************************
$ ggd get-files grch38-reference-genome-ensembl-v1
<conda root>/share/ggd/Homo_sapiens/GRCh38/grch38-reference-genome-ensembl-v1/1/grch38.fa
<conda root>/share/ggd/Homo_sapiens/GRCh38/grch38-reference-genome-ensembl-v1/1/grch38.fa.fai
</pre></div>
</div>
<ol class="arabic simple" start="4">
<li><p>Use the files</p></li>
</ol>
<p>For additional information and examples on how to use the installed data files see: <span class="xref std std-ref">Using installed data</span>.</p>
</div>
</div>
</div>
</div>
</div>
<div class="clearer"></div>
</div>
<div class="footer">
©2016-2021, The GoGetData team.
|
<a href="_sources/using-ggd.rst.txt"
rel="nofollow">Page source</a>
</div>
</body>
</html>