Skip to content

Commit

Permalink
Added prettier
Browse files Browse the repository at this point in the history
  • Loading branch information
GallVp committed Feb 2, 2024
1 parent d494a8f commit a8f9d1e
Show file tree
Hide file tree
Showing 53 changed files with 1,338 additions and 1,277 deletions.
9 changes: 9 additions & 0 deletions .prettierignore
Original file line number Diff line number Diff line change
@@ -0,0 +1,9 @@
# gitignore
.DS_Store
*.pyc
__pycahce__
.nextflow*
work/
results/
*.stdout
*.stderr
1 change: 1 addition & 0 deletions .prettierrc.yml
Original file line number Diff line number Diff line change
@@ -0,0 +1 @@
printWidth: 120
16 changes: 8 additions & 8 deletions CHANGELOG.md
Original file line number Diff line number Diff line change
Expand Up @@ -9,7 +9,7 @@
5. The pipeline does not download the kraken database anymore
6. Fixed a bug in SYNTENY/DNADIFF module which caused failure on AWS Batch
7. Now tar zipped database can be directly used with Kraken2
8. Removed `db_manifest_url` parameter for the NCBI_FCS_GX workflow
8. Removed `db_manifest_url` parameter for the NCBI_FCS_GX workflow
9. Now using parallel version of LTRHARVEST from the EDTA package

## Version 1.2 (18-Dec-2023)
Expand All @@ -26,12 +26,12 @@

For a ~600 MB assembly, EDTA (without sensitive flag) takes ~25 hours of compute time. Whereas, FASTA_LTRRETRIEVER_LAI sub-workflow ( LTRHARVEST+LTRFINDER -> LTRRETRIEVER ) takes ~2.5 hours of compute time. LAI estimates for four plant assemblies are listed below.

| Assembly | EDTA_LAI | FASTA_LTRRETRIEVER_LAI |
|---------------|-----------|---------------------------|
| ck6901m/v2 | 18.43 | 16.19 |
| donghong/v1 | 19.03 | 16.85 |
| red5/v2.1 | 18.75 | 16.59 |
| tair/v10 | 18.06 | 17.42 |
| Assembly | EDTA_LAI | FASTA_LTRRETRIEVER_LAI |
| ----------- | -------- | ---------------------- |
| ck6901m/v2 | 18.43 | 16.19 |
| donghong/v1 | 19.03 | 16.85 |
| red5/v2.1 | 18.75 | 16.59 |
| tair/v10 | 18.06 | 17.42 |

## Version 1.1 (09-Nov-2023)

Expand Down Expand Up @@ -175,7 +175,7 @@ Same as Version 1 RC6c

1. Added Synteny Analysis.
2. Added "-q" and "-qq" option to LAI. "-qq" is the default.
3. Now copying the *.TElib.fa file from EDTA work dir to the results folder.
3. Now copying the \*.TElib.fa file from EDTA work dir to the results folder.
4. Fixed the n_limit bug in assemblathon_stats.pl.
5. Now using 4-hour time limit for FASTP and FASTQC.
6. Added references for all the tools in the README.
Expand Down
38 changes: 19 additions & 19 deletions CITATION.cff
Original file line number Diff line number Diff line change
@@ -1,25 +1,25 @@
cff-version: 1.2.0
message: "If you use this pipeline, please cite it as below."
authors:
- family-names: "Rashid"
given-names: "Usman"
orcid: "https://orcid.org/0000-0002-1109-5493"
- family-names: "Wu"
given-names: "Chen"
- family-names: "Shiller"
given-names: "Jason"
- family-names: "Smith"
given-names: "Ken"
- family-names: "Crowhurst"
given-names: "Ross"
- family-names: "Davy"
given-names: "Marcus"
- family-names: "Chen"
given-names: "Ting-Hsuan"
- family-names: "Thomson"
given-names: "Susan"
- family-names: "Deng"
given-names: "Cecilia"
- family-names: "Rashid"
given-names: "Usman"
orcid: "https://orcid.org/0000-0002-1109-5493"
- family-names: "Wu"
given-names: "Chen"
- family-names: "Shiller"
given-names: "Jason"
- family-names: "Smith"
given-names: "Ken"
- family-names: "Crowhurst"
given-names: "Ross"
- family-names: "Davy"
given-names: "Marcus"
- family-names: "Chen"
given-names: "Ting-Hsuan"
- family-names: "Thomson"
given-names: "Susan"
- family-names: "Deng"
given-names: "Cecilia"
title: "AssemblyQC: A NextFlow pipeline for evaluating assembly quality"
version: 1.2
date-released: 2023-12-19
Expand Down
8 changes: 3 additions & 5 deletions MANIFEST.json
Original file line number Diff line number Diff line change
@@ -1,7 +1,5 @@
{
"mainWorkflowURL": "main.nf",
"inputFileURLs": [
"./docs/test_params/test_agc.json"
],
"engineOptions": "-resume"
"mainWorkflowURL": "main.nf",
"inputFileURLs": ["./docs/test_params/test_agc.json"],
"engineOptions": "-resume"
}
22 changes: 20 additions & 2 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -123,6 +123,7 @@ Once the pipeline has finished execution, the results folder specified in the co
## Software Versions & References

- nf-core/modules([MIT](https://github.com/nf-core/modules/blob/master/LICENSE))

> Ewels PA, Peltzer A, Fillinger S et al. 2020. The nf-core framework for community-curated bioinformatics pipelines. Nat Biotechnol 38, 276–278 (2020). doi: <https://doi.org/10.1038/s41587-020-0439-x>
- FASTA_VALIDATE ([MIT](https://github.com/GallVp/fasta_validator/blob/master/LICENSE))
Expand All @@ -132,21 +133,27 @@ Once the pipeline has finished execution, the results folder specified in the co
>
> Edwards RA. 2019. fasta_validate: a fast and efficient fasta validator written in pure C. doi: <https://doi.org/10.5281/zenodo.2532044>
- GT_GFF3VALIDATOR ([ISC](http://genometools.org/license.html))

> Gremme G, Steinbiss S, Kurtz S. 2013. "GenomeTools: A Comprehensive Software Library for Efficient Processing of Structured Genome Annotations," in IEEE/ACM Transactions on Computational Biology and Bioinformatics, vol. 10, no. 3, pp. 645-656, May 2013, doi: <https://doi.org/10.1109/TCBB.2013.68>.
GT_GFF3VALIDATOR workflow also employs:

- SAMTOOLS (1.16.1, [MIT/Expat](https://github.com/samtools/samtools/blob/develop/LICENSE))
> Danecek P, Bonfield JK, Liddle J, Marshall J, Ohan V, Pollard MO, Whitwham A, Keane T, McCarthy SA, Davies RM, Li H. 2021. Twelve years of SAMtools and BCFtools, GigaScience, Volume 10, Issue 2, February 2021, giab008, <https://doi.org/10.1093/gigascience/giab008>
- NCBI-FCS-ADAPTOR (0.4, [License](https://github.com/ncbi/fcs/blob/main/LICENSE.txt))
> <https://github.com/ncbi/fcs>
- NCBI-FCS-GX (0.4, [License](https://github.com/ncbi/fcs/blob/main/LICENSE.txt))

> <https://github.com/ncbi/fcs>
>
> Astashyn A, Tvedte ES, Sweeney D, Sapojnikov V, Bouk N, Joukov V, Mozes E, Strope PK, Sylla PM, Wagner L, Bidwell SL, Clark K, Davis EW, Smith-White B, Hlavina W, Pruitt KD, Schneider VA, Murphy TD. 2023. bioRxiv 2023.06.02.543519; doi: <https://doi.org/10.1101/2023.06.02.543519>
NCBI-FCS-GX workflow also employs:

- KRONA (2.7.1, [License](https://github.com/marbl/Krona/blob/master/KronaTools/LICENSE.txt))
> Ondov BD, Bergman NH, Phillippy AM. 2011. Interactive metagenomic visualization in a Web browser. BMC Bioinformatics. 2011 Sep 30;12:385. doi: <https://doi.org/10.1186/1471-2105-12-385>
- ASSEMBLATHON_STATS ([CC BY-NC-SA 3.0](https://creativecommons.org/licenses/by-nc-sa/3.0/))
> [github/PlantandFoodResearch/assemblathon2-analysis/a93cba2](https://github.com/PlantandFoodResearch/assemblathon2-analysis/blob/a93cba25d847434f7eadc04e63b58c567c46a56d/assemblathon_stats.pl)
>
Expand All @@ -158,29 +165,40 @@ Once the pipeline has finished execution, the results folder specified in the co
- BUSCO (5.2.2, [MIT](https://gitlab.com/ezlab/busco/-/blob/master/LICENSE))
> Manni M, Berkeley MR, Seppey M, Simão FA, Zdobnov EM. 2021. BUSCO Update: Novel and Streamlined Workflows along with Broader and Deeper Phylogenetic Coverage for Scoring of Eukaryotic, Prokaryotic, and Viral Genomes, Molecular Biology and Evolution, Volume 38, Issue 10, October 2021, Pages 4647–4654, <https://doi.org/10.1093/molbev/msab199>
- TIDK (0.2.31, [MIT](https://github.com/tolkit/telomeric-identifier/blob/main/LICENSE))

> <https://github.com/tolkit/telomeric-identifier>
TIDK workflow also employs:

- SEQKIT (2.3.1, [MIT](https://github.com/shenwei356/seqkit/blob/master/LICENSE))
> Shen W, Le S, Li Y, Hu F. 2016. SeqKit: A Cross-Platform and Ultrafast Toolkit for FASTA/Q File Manipulation. PLoS ONE 11(10): e0163962. <https://doi.org/10.1371/journal.pone.0163962>
- LAI (beta3.2, [GPL v3](https://github.com/oushujun/LTR_retriever/blob/master/LICENSE))

> Ou S, Chen J, Jiang N. 2018. Assessing genome assembly quality using the LTR Assembly Index (LAI), Nucleic Acids Research, Volume 46, Issue 21, 30 November 2018, Page e126, <https://doi.org/10.1093/nar/gky730>
LAI workflow also employs:

- LTR_FINDER_parallel (1.2, [MIT](https://github.com/oushujun/LTR_FINDER_parallel/blob/master/LICENSE))
> Ou S, Jiang N 2019. LTR_FINDER_parallel: parallelization of LTR_FINDER enabling rapid identification of long terminal repeat retrotransposons. Mobile DNA 10, 48 (2019). <https://doi.org/10.1186/s13100-019-0193-0>
- GT_LTRHARVEST (1.6.2, [ISC](http://genometools.org/license.html))

> Gremme G, Steinbiss S, Kurtz S. 2013. "GenomeTools: A Comprehensive Software Library for Efficient Processing of Structured Genome Annotations," in IEEE/ACM Transactions on Computational Biology and Bioinformatics, vol. 10, no. 3, pp. 645-656, May 2013, doi: <https://doi.org/10.1109/TCBB.2013.68>.
> Ellinghaus, D, Kurtz, S & Willhoeft, U 2008. LTRharvest, an efficient and flexible software for de novo detection of LTR retrotransposons. BMC Bioinformatics 9, 18 (2008). <https://doi.org/10.1186/1471-2105-9-18>
- LTR_retriever (2.9.0 [GPL v3](https://github.com/oushujun/LTR_retriever/blob/master/LICENSE))
> Shujun O, Ning J 2018. LTR_retriever: A Highly Accurate and Sensitive Program for Identification of Long Terminal Repeat Retrotransposons, Plant Physiology, 176, 2 (2018). <https://doi.org/10.1104/pp.17.01310>
- KRAKEN2 (2.1.2, [MIT](https://github.com/DerrickWood/kraken2/blob/master/LICENSE))
> Wood DE, Salzberg SL, Wood DE, Lu J, Langmead B. 2019. Improved metagenomic analysis with Kraken 2. Genome Biol 20, 257 (2019). <https://doi.org/10.1186/s13059-019-1891-0>

> Wood DE, Salzberg SL, Wood DE, Lu J, Langmead B. 2019. Improved metagenomic analysis with Kraken 2. Genome Biol 20, 257 (2019). <https://doi.org/10.1186/s13059-019-1891-0>
KRAKEN2 workflow also employs:

- KRONA (2.7.1, [License](https://github.com/marbl/Krona/blob/master/KronaTools/LICENSE.txt))
> Ondov BD, Bergman NH, Phillippy AM. 2011. Interactive metagenomic visualization in a Web browser. BMC Bioinformatics. 2011 Sep 30;12:385. doi: <https://doi.org/10.1186/1471-2105-12-385>
- HIC CONTACT MAP
- JUICEBOX.JS (2.4.3, [MIT](https://github.com/igvteam/juicebox.js/blob/master/LICENSE))
> Robinson JT, Turner D, Durand NC, Thorvaldsdóttir H, Mesirov JP, Aiden EL. 2018. Juicebox.js Provides a Cloud-Based Visualization System for Hi-C Data. Cell Syst. 2018 Feb 28;6(2):256-258.e1. doi: <https://doi.org/10.1016/j.cels.2018.01.001>. Epub 2018 Feb 7. PMID: 29428417; PMCID: PMC6047755.
Expand All @@ -189,7 +207,7 @@ Once the pipeline has finished execution, the results folder specified in the co
- FASTQC (0.11.9, [GPL v3](https://github.com/s-andrews/FastQC/blob/master/LICENSE.txt))
> <https://github.com/s-andrews/FastQC>
- RUN_ASSEMBLY_VISUALIZER (commit: 63029aa, [MIT](https://github.com/aidenlab/3d-dna/blob/master/LICENSE))
> Dudchenko O, Batra SS, Omer AD, Nyquist SK, Hoeger M, Durand NC, Shamim MS, Machol I, Lander, Aiden AP, Aiden EL 2017. De novo assembly of the Aedes aegypti genome using Hi-C yields chromosome-length scaffolds.Science356, 92-95(2017). doi: <https://doi.org/10.1126/science.aal3327>. Available at: <https://github.com/aidenlab/3d-dna/commit/63029aa3bc5ba9bbdad9dd9771ace583cc95e273>
> Dudchenko O, Batra SS, Omer AD, Nyquist SK, Hoeger M, Durand NC, Shamim MS, Machol I, Lander, Aiden AP, Aiden EL 2017. De novo assembly of the Aedes aegypti genome using Hi-C yields chromosome-length scaffolds.Science356, 92-95(2017). doi: <https://doi.org/10.1126/science.aal3327>. Available at: <https://github.com/aidenlab/3d-dna/commit/63029aa3bc5ba9bbdad9dd9771ace583cc95e273>
- HIC_QC (commit: 6881c33, [AGPL v3](https://github.com/phasegenomics/hic_qc/blob/master/LICENSE))
> <https://github.com/phasegenomics/hic_qc/commit/6881c3390fd4afb85009a52918b4d068100c58b4>
- JUICEBOX_SCRIPTS (commit: a7ae991, [AGPL v3](https://github.com/phasegenomics/juicebox_scripts/blob/master/LICENSE))
Expand Down
14 changes: 7 additions & 7 deletions agc-project.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -9,17 +9,17 @@ workflows:
contexts:
CtxAssemblyQC:
instanceTypes:
- "m5.large" # process_single { 1, 6 } -> { 2, 8 }
- "m5.large" # process_single { 1, 6 } -> { 2, 8 }
- "m5.xlarge"
- "m5.2xlarge"
- "m5.4xlarge"
- "r5.large" # process_low { 2, 12 } -> { 2, 16 }
- "r5.large" # process_low { 2, 12 } -> { 2, 16 }
- "r5.xlarge"
- "r5.2xlarge" # process_medium { 6, 36 } -> { 8, 64 }
- "r5.4xlarge" # process_high { 12, 72 } -> { 16, 128 }
- "r5.8xlarge" # process_high_memory (200.GB) -> { 32, 256 }
- "r5.24xlarge" # process_very_high_memory (512.GB) -> { 96, 768 }
- "c5.large" # Compute optimized instances
- "r5.2xlarge" # process_medium { 6, 36 } -> { 8, 64 }
- "r5.4xlarge" # process_high { 12, 72 } -> { 16, 128 }
- "r5.8xlarge" # process_high_memory (200.GB) -> { 32, 256 }
- "r5.24xlarge" # process_very_high_memory (512.GB) -> { 96, 768 }
- "c5.large" # Compute optimized instances
- "c5.xlarge"
- "c5.2xlarge"
- "c5.4xlarge"
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -2,15 +2,18 @@
<div class="section-para-wrapper">
<p class="section-para">A script to calculate a basic set of metrics from a genome assembly.</p>
<p class="section-para"><b>Reference:</b></p>
<p class="section-para"><a href="https://github.com/KorfLab/Assemblathon"
target="_blank">https://github.com/KorfLab/Assemblathon</a></p>
<p class="section-para">
<a href="https://github.com/KorfLab/Assemblathon" target="_blank"
>https://github.com/KorfLab/Assemblathon</a
>
</p>
<p class="section-para"><b>Version: {{ all_stats_dicts['VERSIONS']['ASSEMBLATHON_STATS'] }}</b></p>
<p class="section-para"><b>Warning:</b></p>
<p class="section-para">Contig-related stats are based on the assumption that
the n_limit ({{ all_stats_dicts['PARAMS_DICT']['assemblathon_stats']['n_limit'] }}) parameter is specified
correctly. If you are not certain of the value of the n_limit parameter, please ignore the contig-related
stats.</p>
<p class="section-para">
Contig-related stats are based on the assumption that the n_limit ({{
all_stats_dicts['PARAMS_DICT']['assemblathon_stats']['n_limit'] }}) parameter is specified correctly. If you
are not certain of the value of the n_limit parameter, please ignore the contig-related stats.
</p>
</div>
{% include 'assemblathon_stats/dropdown.html' %}
{% include 'assemblathon_stats/report_contents.html' %}
</div>
{% include 'assemblathon_stats/dropdown.html' %} {% include 'assemblathon_stats/report_contents.html' %}
</div>
14 changes: 7 additions & 7 deletions bin/report_modules/templates/assemblathon_stats/dropdown.html
Original file line number Diff line number Diff line change
@@ -1,10 +1,10 @@
<div class="dropdown">
<div class="dropdown_content">
<select id="selector_ASSEMBLATHON_STATS" onchange="showContent('ASSEMBLATHON_STATS')">
<div class="dropdown_content">
<select id="selector_ASSEMBLATHON_STATS" onchange="showContent('ASSEMBLATHON_STATS')">
{% set str_hap = 'hap' %} {% for item in range(all_stats_dicts["ASSEMBLATHON_STATS"]|length) %}
<option value="tabcontent_ASSEMBLATHON_STATS_{{all_stats_dicts['ASSEMBLATHON_STATS'][item]['hap']}}">
{{ all_stats_dicts['ASSEMBLATHON_STATS'][item][str_hap] }}
{% endfor %}
</select>
</div>
</div>
{{ all_stats_dicts['ASSEMBLATHON_STATS'][item][str_hap] }} {% endfor %}
</option>
</select>
</div>
</div>
Original file line number Diff line number Diff line change
@@ -1,18 +1,17 @@
{% set vars = {'is_first': True} %}
{% for item in range(all_stats_dicts["ASSEMBLATHON_STATS"]|length) %}
{% set active_text = 'display: block' if vars.is_first else 'display: none' %}
<div id="tabcontent_ASSEMBLATHON_STATS_{{ all_stats_dicts['ASSEMBLATHON_STATS'][item]['hap'] }}"
class="tabcontent-ASSEMBLATHON_STATS" style="{{ active_text }}">
{% set vars = {'is_first': True} %} {% for item in range(all_stats_dicts["ASSEMBLATHON_STATS"]|length) %} {% set
active_text = 'display: block' if vars.is_first else 'display: none' %}
<div
id="tabcontent_ASSEMBLATHON_STATS_{{ all_stats_dicts['ASSEMBLATHON_STATS'][item]['hap'] }}"
class="tabcontent-ASSEMBLATHON_STATS"
style="{{ active_text }}"
>
<div class="results-section">
<div class="section-heading-wrapper">
<div class="section-heading">{{ all_stats_dicts['ASSEMBLATHON_STATS'][item]['hap'] }}</div>
</div>
</div>
<div class="table-outer">
<div class="table-wrapper">
{{ all_stats_dicts['ASSEMBLATHON_STATS'][item]['report_table_html'] }}
</div>
<div class="table-wrapper">{{ all_stats_dicts['ASSEMBLATHON_STATS'][item]['report_table_html'] }}</div>
</div>
</div>
{% if vars.update({'is_first': False}) %} {% endif %}
{% endfor %}
{% if vars.update({'is_first': False}) %} {% endif %} {% endfor %}
Loading

0 comments on commit a8f9d1e

Please sign in to comment.