Skip to content

Commit

Permalink
Added orthofinder
Browse files Browse the repository at this point in the history
  • Loading branch information
GallVp committed Nov 2, 2024
1 parent d410971 commit 2e8844d
Show file tree
Hide file tree
Showing 32 changed files with 1,521 additions and 63 deletions.
74 changes: 11 additions & 63 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -16,61 +16,7 @@

## Pipeline Flowchart

```mermaid
%%{init: {
'theme': 'base',
'themeVariables': {
'fontSize': '52px",
'primaryColor': '#9A6421',
'primaryTextColor': '#ffffff',
'primaryBorderColor': '#9A6421',
'lineColor': '#B180A8',
'secondaryColor': '#455C58',
'tertiaryColor': '#ffffff'
}
}}%%
flowchart LR
forEachTag(Assembly) ==> VALIDATE_FORMAT[VALIDATE FORMAT]
VALIDATE_FORMAT ==> ncbiFCS[<span style="white-space: nowrap;">NCBI FCS ADAPTOR</span>]
ncbiFCS ==> Check{Check}
VALIDATE_FORMAT ==> ncbiGX[<span style="white-space: nowrap;">NCBI FCS GX</span>]
ncbiGX ==> Check
Check ==> |Clean|Run(Run)
Check ==> |Contamination|Skip(Skip All)
Skip ==> REPORT
VALIDATE_FORMAT ==> GFF_STATS[<span style="white-space: nowrap;">GENOMETOOLS GT STAT</span>]
Run ==> ASS_STATS[<span style="white-space: nowrap;">STATS</span>]
Run ==> BUSCO
Run ==> TIDK
Run ==> LAI
Run ==> KRAKEN2
Run ==> HIC_CONTACT_MAP[<span style="white-space: nowrap;">HIC CONTACT MAP</span>]
Run ==> MUMMER
Run ==> MINIMAP2
Run ==> MERQURY
MUMMER ==> CIRCOS
MUMMER ==> DOTPLOT
MINIMAP2 ==> PLOTSR
ASS_STATS ==> REPORT
GFF_STATS ==> REPORT
BUSCO ==> REPORT
TIDK ==> REPORT
LAI ==> REPORT
KRAKEN2 ==> REPORT
HIC_CONTACT_MAP ==> REPORT
CIRCOS ==> REPORT
DOTPLOT ==> REPORT
PLOTSR ==> REPORT
MERQURY ==> REPORT
```
<p align="center"><img src="docs/images/assemblyqc.png"></p>

- [FASTA VALIDATOR](https://github.com/linsalrob/fasta_validator) + [SEQKIT RMDUP](https://github.com/shenwei356/seqkit): FASTA validation
- [GENOMETOOLS GT GFF3VALIDATOR](https://genometools.org/tools/gt_gff3validator.html): GFF3 validation
Expand All @@ -85,6 +31,7 @@ flowchart LR
- [HIC CONTACT MAP](https://github.com/igvteam/juicebox.js): Alignment and visualisation of HiC data
- [MUMMER](https://github.com/mummer4/mummer)[CIRCOS](http://circos.ca/documentation/) + [DOTPLOT](https://plotly.com) & [MINIMAP2](https://github.com/lh3/minimap2)[PLOTSR](https://github.com/schneebergerlab/plotsr): Synteny analysis
- [MERQURY](https://github.com/marbl/merqury): K-mer completeness, consensus quality and phasing assessment
- [ORTHOFINDER](https://github.com/davidemms/OrthoFinder): Phylogenetic orthology inference for comparative genomics

## Usage

Expand Down Expand Up @@ -140,31 +87,32 @@ The pipeline uses nf-core modules contributed by following authors:

<a href="https://github.com/gallvp"><img src="https://github.com/gallvp.png" width="50" height="50"></a>
<a href="https://github.com/drpatelh"><img src="https://github.com/drpatelh.png" width="50" height="50"></a>
<a href="https://github.com/midnighter"><img src="https://github.com/midnighter.png" width="50" height="50"></a>
<a href="https://github.com/mahesh-panchal"><img src="https://github.com/mahesh-panchal.png" width="50" height="50"></a>
<a href="https://github.com/jfy133"><img src="https://github.com/jfy133.png" width="50" height="50"></a>
<a href="https://github.com/adamrtalbot"><img src="https://github.com/adamrtalbot.png" width="50" height="50"></a>
<a href="https://github.com/midnighter"><img src="https://github.com/midnighter.png" width="50" height="50"></a>
<a href="https://github.com/joseespinosa"><img src="https://github.com/joseespinosa.png" width="50" height="50"></a>
<a href="https://github.com/sofstam"><img src="https://github.com/sofstam.png" width="50" height="50"></a>
<a href="https://github.com/sateeshperi"><img src="https://github.com/sateeshperi.png" width="50" height="50"></a>
<a href="https://github.com/maxulysse"><img src="https://github.com/maxulysse.png" width="50" height="50"></a>
<a href="https://github.com/matthdsm"><img src="https://github.com/matthdsm.png" width="50" height="50"></a>
<a href="https://github.com/joseespinosa"><img src="https://github.com/joseespinosa.png" width="50" height="50"></a>
<a href="https://github.com/heuermh"><img src="https://github.com/heuermh.png" width="50" height="50"></a>
<a href="https://github.com/grst"><img src="https://github.com/grst.png" width="50" height="50"></a>
<a href="https://github.com/fellen31"><img src="https://github.com/fellen31.png" width="50" height="50"></a>
<a href="https://github.com/ewels"><img src="https://github.com/ewels.png" width="50" height="50"></a>
<a href="https://github.com/sofstam"><img src="https://github.com/sofstam.png" width="50" height="50"></a>
<a href="https://github.com/sateeshperi"><img src="https://github.com/sateeshperi.png" width="50" height="50"></a>
<a href="https://github.com/edmundmiller"><img src="https://github.com/edmundmiller.png" width="50" height="50"></a>
<a href="https://github.com/adamrtalbot"><img src="https://github.com/adamrtalbot.png" width="50" height="50"></a>
<a href="https://github.com/robsyme"><img src="https://github.com/robsyme.png" width="50" height="50"></a>
<a href="https://github.com/priyanka-surana"><img src="https://github.com/priyanka-surana.png" width="50" height="50"></a>
<a href="https://github.com/phue"><img src="https://github.com/phue.png" width="50" height="50"></a>
<a href="https://github.com/nvnieuwk"><img src="https://github.com/nvnieuwk.png" width="50" height="50"></a>
<a href="https://github.com/muffato"><img src="https://github.com/muffato.png" width="50" height="50"></a>
<a href="https://github.com/lescai"><img src="https://github.com/lescai.png" width="50" height="50"></a>
<a href="https://github.com/kevinmenden"><img src="https://github.com/kevinmenden.png" width="50" height="50"></a>
<a href="https://github.com/jvhagey"><img src="https://github.com/jvhagey.png" width="50" height="50"></a>
<a href="https://github.com/jeremy1805"><img src="https://github.com/jeremy1805.png" width="50" height="50"></a>
<a href="https://github.com/heuermh"><img src="https://github.com/heuermh.png" width="50" height="50"></a>
<a href="https://github.com/friederikehanssen"><img src="https://github.com/friederikehanssen.png" width="50" height="50"></a>
<a href="https://github.com/fellen31"><img src="https://github.com/fellen31.png" width="50" height="50"></a>
<a href="https://github.com/felixkrueger"><img src="https://github.com/felixkrueger.png" width="50" height="50"></a>
<a href="https://github.com/erikrikarddaniel"><img src="https://github.com/erikrikarddaniel.png" width="50" height="50"></a>
<a href="https://github.com/edmundmiller"><img src="https://github.com/edmundmiller.png" width="50" height="50"></a>
<a href="https://github.com/d4straub"><img src="https://github.com/d4straub.png" width="50" height="50"></a>
<a href="https://github.com/charles-plessy"><img src="https://github.com/charles-plessy.png" width="50" height="50"></a>

Expand Down
2 changes: 2 additions & 0 deletions bin/assemblyqc.py
Original file line number Diff line number Diff line change
Expand Up @@ -29,6 +29,7 @@
from report_modules.parsers.hic_parser import parse_hic_folder
from report_modules.parsers.synteny_parser import parse_synteny_folder
from report_modules.parsers.merqury_parser import parse_merqury_folder
from report_modules.parsers.orthofinder_parser import parse_orthofinder_folder

if __name__ == "__main__":
params_dict, params_table = parse_params_json("params_json.json")
Expand Down Expand Up @@ -57,6 +58,7 @@
data_from_tools = {**data_from_tools, **parse_hic_folder()}
data_from_tools = {**data_from_tools, **parse_synteny_folder()}
data_from_tools = {**data_from_tools, **parse_merqury_folder()}
data_from_tools = {**data_from_tools, **parse_orthofinder_folder()}

with open("software_versions.yml", "r") as f:
versions_from_ch_versions = yaml.safe_load(f)
Expand Down
100 changes: 100 additions & 0 deletions bin/report_modules/parsers/orthofinder_parser.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,100 @@
import pandas as pd
import base64
import os
import re

import matplotlib.pyplot as plt
from tabulate import tabulate
from pathlib import Path
from io import StringIO
from Bio import Phylo


def parse_orthofinder_folder(folder_name="orthofinder_outputs/assemblyqc"):
dir = os.getcwdb().decode()
results_root_path = Path(f"{dir}/{folder_name}")

if not results_root_path.exists():
return {}

data = {"ORTHOFINDER": {}}

# Species tree
tree = Phylo.read(
f"{results_root_path}/Species_Tree/SpeciesTree_rooted.txt", "newick"
)

fig = plt.figure(figsize=(6, 6))
ax = fig.add_subplot(1, 1, 1)
Phylo.draw(tree, do_show=False, axes=ax)

plt.gca().spines["top"].set_visible(False)
plt.gca().spines["right"].set_visible(False)

plt.savefig("speciestree_rooted.png", format="png", dpi=300)

with open("speciestree_rooted.png", "rb") as f:
binary_fc = f.read()

base64_utf8_str = base64.b64encode(binary_fc).decode("utf-8")
data["ORTHOFINDER"]["speciestree_rooted"] = (
f"data:image/png+xml;base64,{base64_utf8_str}"
)

# Overall statistics
overall_statistics = Path(
f"{results_root_path}/Comparative_Genomics_Statistics/Statistics_Overall.tsv"
).read_text()

## General stats
general_stats = re.findall(
r"(Number of species.*)Orthogroups file", overall_statistics, flags=re.DOTALL
)[0]
general_stats_pd = pd.read_csv(StringIO(general_stats), sep="\t")

data["ORTHOFINDER"]["general_stats"] = general_stats_pd.to_dict("records")
data["ORTHOFINDER"]["general_stats_html"] = tabulate(
general_stats_pd,
headers=["Stat", "Value"],
tablefmt="html",
numalign="left",
showindex=False,
)

## Genes per-species
genes_per_species = re.findall(
r"(Average number of genes per-species in orthogroup.*)Number of species in orthogroup",
overall_statistics,
flags=re.DOTALL,
)[0]
genes_per_species_pd = pd.read_csv(StringIO(genes_per_species), sep="\t", header=0)
data["ORTHOFINDER"]["genes_per_species"] = genes_per_species_pd.to_dict("records")
data["ORTHOFINDER"]["genes_per_species_html"] = tabulate(
genes_per_species_pd,
headers=genes_per_species_pd.columns.to_list(),
tablefmt="html",
numalign="left",
showindex=False,
)

## Number of species in orthogroup
num_species_orthogroup = re.findall(
r"(Number of species in orthogroup.*)",
overall_statistics,
flags=re.DOTALL,
)[0]
num_species_orthogroup_pd = pd.read_csv(
StringIO(num_species_orthogroup), sep="\t", header=0
)
data["ORTHOFINDER"]["num_species_orthogroup"] = num_species_orthogroup_pd.to_dict(
"records"
)
data["ORTHOFINDER"]["num_species_orthogroup_html"] = tabulate(
num_species_orthogroup_pd,
headers=num_species_orthogroup_pd.columns.to_list(),
tablefmt="html",
numalign="left",
showindex=False,
)

return data
10 changes: 10 additions & 0 deletions bin/report_modules/templates/base.html
Original file line number Diff line number Diff line change
Expand Up @@ -79,6 +79,11 @@
{% if 'MERQURY' in all_stats_dicts %}
<button class="tablinks" onclick="openTool(event, 'MERQURY')">MERQURY</button>
{% endif %}

{% if 'ORTHOFINDER' in all_stats_dicts %}
<button class="tablinks" onclick="openTool(event, 'ORTHOFINDER')">ORTHOFINDER</button>
{% endif %}

</div>

{% include 'params/params.html' %}
Expand Down Expand Up @@ -151,6 +156,11 @@
{% if 'MERQURY' in all_stats_dicts %}
{% include 'merqury/merqury.html' %}
{% endif %}

{% if 'ORTHOFINDER' in all_stats_dicts %}
{% include 'orthofinder/orthofinder.html' %}
{% endif %}

</body>
{% include 'js.html' %}

Expand Down
14 changes: 14 additions & 0 deletions bin/report_modules/templates/orthofinder/orthofinder.html
Original file line number Diff line number Diff line change
@@ -0,0 +1,14 @@
<div id="ORTHOFINDER" class="tabcontent" style="display: none">
<div class="section-para-wrapper">
<p class="section-para">
A tool for phylogenetic orthology inference for comparative genomics.
</p>
<p class="section-para"><b>Reference:</b></p>
<p class="section-para">
Emms, D.M., Kelly, S. OrthoFinder: phylogenetic orthology inference for comparative genomics. Genome Biol 20, 238
(2019). <a href="https://doi.org/10.1186/s13059-019-1832-y" target="_blank">10.1186/s13059-019-1832-y</a>
</p>
<p class="section-para"><b>Version: {{ all_stats_dicts['VERSIONS']['ORTHOFINDER']['orthofinder'] }}</b></p>
</div>
{% include 'orthofinder/report_contents.html' %}
</div>
33 changes: 33 additions & 0 deletions bin/report_modules/templates/orthofinder/report_contents.html
Original file line number Diff line number Diff line change
@@ -0,0 +1,33 @@
<div id="tabcontent_ORTHOFINDER" class="tabcontent-ORTHOFINDER" style="display: block">
<div class="results-section">

<div class="section-para-wrapper">
<p class="section-para"><b>Species tree (rooted)</b></p>
</div>
<div class="image-wrapper" style="width: 60%;margin-left: auto;margin-right: auto;">
<img src="{{ all_stats_dicts['ORTHOFINDER']['speciestree_rooted'] }}" alt="" />
</div>

<div class="section-para-wrapper">
<p class="section-para"><b>General statistics</b></p>
</div>
<div class="table-outer">
<div class="table-wrapper">{{ all_stats_dicts['ORTHOFINDER']['general_stats_html'] }}</div>
</div>

<div class="section-para-wrapper">
<p class="section-para"><b>Genes per-species in orthogroup</b></p>
</div>
<div class="table-outer">
<div class="table-wrapper">{{ all_stats_dicts['ORTHOFINDER']['genes_per_species_html'] }}</div>
</div>

<div class="section-para-wrapper">
<p class="section-para"><b>Number of species in orthogroup</b></p>
</div>
<div class="table-outer">
<div class="table-wrapper">{{ all_stats_dicts['ORTHOFINDER']['num_species_orthogroup_html'] }}</div>
</div>

</div>
</div>
12 changes: 12 additions & 0 deletions conf/modules.config
Original file line number Diff line number Diff line change
Expand Up @@ -388,6 +388,18 @@ process {
]
}

withName: '.*:ASSEMBLYQC:GFFREAD' {
ext.args = '-y -S'
}

withName: '.*:ASSEMBLYQC:ORTHOFINDER' {
publishDir = [
path: { "${params.outdir}/orthofinder" },
mode: params.publish_dir_mode,
saveAs: { filename -> filename.equals('versions.yml') ? null : filename },
]
}

withName: '.*:ASSEMBLYQC:CREATEREPORT' {
publishDir = [
[
Expand Down
Binary file added docs/images/assemblyqc.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
6 changes: 6 additions & 0 deletions docs/parameters.md
Original file line number Diff line number Diff line change
Expand Up @@ -100,6 +100,12 @@ A Nextflow pipeline which evaluates assembly quality with multiple QC tools and
| `merqury_skip` | Skip merqury analysis | `boolean` | True | | |
| `merqury_kmer_length` | kmer length for merqury analysis | `integer` | 21 | | |

## Orthofinder options

| Parameter | Description | Type | Default | Required | Hidden |
| ------------------ | ---------------- | --------- | ------- | -------- | ------ |
| `orthofinder_skip` | Skip orthofinder | `boolean` | True | | |

## Institutional config options

Parameters used to describe centralised config profiles. These should not be edited.
Expand Down
10 changes: 10 additions & 0 deletions modules.json
Original file line number Diff line number Diff line change
Expand Up @@ -180,6 +180,11 @@
"git_sha": "666652151335353eef2fcd58880bcef5bc2928e1",
"installed_by": ["modules"]
},
"gffread": {
"branch": "master",
"git_sha": "666652151335353eef2fcd58880bcef5bc2928e1",
"installed_by": ["modules"]
},
"gunzip": {
"branch": "master",
"git_sha": "666652151335353eef2fcd58880bcef5bc2928e1",
Expand Down Expand Up @@ -210,6 +215,11 @@
"git_sha": "666652151335353eef2fcd58880bcef5bc2928e1",
"installed_by": ["modules"]
},
"orthofinder": {
"branch": "master",
"git_sha": "666652151335353eef2fcd58880bcef5bc2928e1",
"installed_by": ["modules"]
},
"seqkit/rmdup": {
"branch": "master",
"git_sha": "666652151335353eef2fcd58880bcef5bc2928e1",
Expand Down
1 change: 1 addition & 0 deletions modules/local/createreport.nf
Original file line number Diff line number Diff line change
Expand Up @@ -20,6 +20,7 @@ process CREATEREPORT {
path hic_outputs , stageAs: 'hic_outputs/*'
path synteny_outputs , stageAs: 'synteny_outputs/*'
path merqury_outputs , stageAs: 'merqury_outputs/*'
path orthofinder_outputs , stageAs: 'orthofinder_outputs/*'
path versions
val params_json
val params_summary_json
Expand Down
5 changes: 5 additions & 0 deletions modules/nf-core/gffread/environment.yml

Some generated files are not rendered by default. Learn more about how customized files appear on GitHub.

Loading

0 comments on commit 2e8844d

Please sign in to comment.