GitHub - DessimozLab/nf-oma-browser-build: Nextflow pipeline to convert OMA to OMA browser

Introduction

dessimozlab/nf-oma-browser-build is a nextflow pipeline for building an OMA Browser instance from an OMA (Orthologous MAtrix) analysis. The pipeline converts the output of either a production OMA run or a FastOMA run into the HDF5 files needed to run a omabrowser webserver. The pipeline integrates a lot of additional data, i.e. GO annotations, domain annotations and cross-references to uniprot and refseq. Furthermore, the pipeline annotates the OMA Hierarchical Orthologous Groups (HOGs) and the OMA Groups with descriptions, computes HOG Profiles, infers Gene Ontology annotations for HOGs and reconstructs ancestral synteny, i.e. HOG orders.

All of this data can be interactively analysed with the OMA Browser web interface using a docker compose setup on the user's computer.

Pipeline summary

First part of the pipeline is dependent on input, i.e. production / FastOMA. The later steps are common to both input types.

From production OMA pipeline:

extract genomes in dataset from Matrix file
extract from genome dbs relevant data such as proteins, locus, etc
convert Matrix, extract splicing information

From FastOMA:

extract species tree, import proteomes and taxonomy
include additional species information from species_info file

After the initial steps, the pipeline continues with the common part.

Common part

convert HOGs, sequences into HDF5 database, build suffix index and kmer-lookup table (in subworkflow IMPORT_HDF5)
import domain annotations if available
import cross-references from UniProt and RefSeq (subworkflow GENERATE_XREFS)
import GO annotations and Ontology
infer keywords and fingerprints for HOGs and OMA Groups
compute and import HOG profiles (with HogProf)
infer ancestral GO annotations for HOGs (with HogProp)
infer ancestral synteny (with edgehog)

The pipeline produces in the end in the outputDir (default results/) the necessary files to be loaded into a docker-compose managed omabrowser instance.

Running the pipeline

The pipeline can be run with the following command:

nextflow run . -profile <profiles> [-work-dir </path/to/shared/scratch/space>] ([--<parameter> <value>]* | -params-file <paramters_file>)

We recommend to use the docker or singularity profile. And we try to support all the nf-core institutional profiles as well. Extra configurations can also be provided using the -c flag in nextflow to load your own configuration file.

Instead of specifiying the parameters on the command line, you can also provide a parameter file with the -params-file option. This file can even be generated interactively with the nf-core pipelines create-params-file command.

As an example, one can run the pipeline with a small test dataset using the following command:

nextflow run . -profile docker,test

Parameters

All parameters are listed together with a brief description by running the workflow with the --help flag:

nextflow run . --help

Below, we list in a slightly extended form the parameters that are specific to the pipeline. The parameters are grouped by the kind of input data they are related to. (These tables can be generated with nf-core pipelines schema docs -x markdown)

Convert OMA run into OMA Browser release

Datatype setting

Parameter	Description	Type	Default	Required
`oma_source`	Selection of OMA data source. Can be either 'FastOMA' or 'Production'. The selection requires setting either the parameters for FastOMA or Production.	`string`	FastOMA

FastOMA Input data

Input files generated with FastOMA

Parameter	Description	Type
`fastoma_species_tree`	Species Tree in newick format. We recommend using the tree stored in the FastOMA output folder named 'species_tree_checked.nwk'.	`string`
`fastoma_proteomes`	Folder where the input fasta files for each proteomes are located.	`string`
`fastoma_speciesdata`	TSV file with additional information about the proteomes, must contain "Name" column if provided. Help Optional (but recommended) TSV file with additional information about the proteomes. If specified, the file must contain at least the column "Name". It's values must match the filenames of the proteomes (without file extension). We suggest to include the following columns in addition: - NCBITaxonId (needed to map cross-references reliably)	`string`

Production OMA Input data

Input files genereated from an OMA Production run

Parameter	Description	Type	Required
`pairwise_orthologs_folder`	Pairwise Orthologs (only by Standard OMA pipeline)	`string`
`matrix_file`	OMA Groups file	`string`
`hog_orthoxml`	Hierarchcial orthologous groups (HOGs) in orthoxml format	`string`	True
`genomes_dir`	Folder containing genomes	`string`	True

Domain data

File paths for domain annotations

Parameter	Description	Type	Default
`cath_names_path`	File containing CATH domain descriptions	`string`	http://download.cathdb.info/cath/releases/latest-release/cath-classification-data/cath-names.txt
`known_domains`	Folder containing known domain assignments files	`string`
`pfam_names_path`	File containing Pfam descriptions	`string`	https://ftp.ebi.ac.uk/pub/databases/Pfam/current_release/Pfam-A.clans.tsv.gz

Crossreferences

Integrate crossreferences

Parameter	Description	Type	Default
`xref_uniprot_swissprot`	UniProtKB/SwissProt annotation in text format	`string`	https://ftp.ebi.ac.uk/pub/databases/uniprot/knowledgebase/uniprot_sprot.dat.gz
`xref_uniprot_trembl`	UniProtKB/TrEMBL annotations in text format	`string`	/dev/null
`taxonomy_sqlite_path`		`string`
`xref_refseq`	Folder containing RefSeq gbff files.	`string`

Gene Ontology

Gene Ontology files to integrate

Parameter	Description	Type	Default	Required
`go_obo`	Gene Ontology OBO file	`string`	http://purl.obolibrary.org/obo/go/go-basic.obo
`go_gaf`	Gene Ontology annotations (GAF format). This can the GOA database or a glob pattern with local files in gaf format.	`string`	https://ftp.ebi.ac.uk/pub/databases/GO/goa/UNIPROT/goa_uniprot_all.gaf.gz

Generic options

Less common options for the pipeline, typically set in a config file.

Parameter	Description	Type	Default	Required
`custom_config_version`	version of configuration base to include (nf-core configs)	`string`	master
`custom_config_base`	location where to look for nf-core/configs	`string`	https://raw.githubusercontent.com/nf-core/configs/master

Name		Name	Last commit message	Last commit date
Latest commit History 234 Commits
.github/workflows		.github/workflows
assets		assets
config		config
containers		containers
modules		modules
subworkflows/local		subworkflows/local
testdata		testdata
workflows		workflows
.gitignore		.gitignore
.nf-core.yml		.nf-core.yml
LICENSE		LICENSE
README.md		README.md
main.nf		main.nf
modules.json		modules.json
nextflow.config		nextflow.config
nextflow_schema.json		nextflow_schema.json

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

Introduction

Pipeline summary

From production OMA pipeline:

From FastOMA:

Common part

Running the pipeline

Parameters

Datatype setting

FastOMA Input data

Production OMA Input data

Domain data

Crossreferences

Gene Ontology

Generic options

About

Uh oh!

Releases 2

Packages

Uh oh!

Contributors 2

Uh oh!

Languages

License

DessimozLab/nf-oma-browser-build

Folders and files

Latest commit

History

Repository files navigation

Introduction

Pipeline summary

From production OMA pipeline:

From FastOMA:

Common part

Running the pipeline

Parameters

Datatype setting

FastOMA Input data

Production OMA Input data

Domain data

Crossreferences

Gene Ontology

Generic options

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases 2

Packages 0

Uh oh!

Contributors 2

Uh oh!

Languages

Packages