You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Add tests, use ena-webin-cli handler, refactor modules.conf and update docs (#39)
Features:
* replace `ena-webin-cli` with `ena-webin-cli handler`
* added `webin-cli.jar` download to reuse it for every upload
* webin-cli-wrapper outputs TSV with accessions which resolves#31
* FASTAVALIDATOR step added to genomesubmit (previously we only validated fasta in assemblysubmit)
Tests:
* multiple tests added along with snapshots and profiles to test` --mode mag` and `--mode metagenomic_assembly`, test data pushed to nf-datasets
* more info about tests #32 (comment)
Bug fixes:
* added `triggers` to only download DBs if there are data to process and no local DB provided,
* create output folder for metadata CSV/TSV if it doesn't exist,
* resolved OUT_OF_MEM problem in webin-cli-wrapper,
* path in metadata CSV/TSV leading to http locations for remote files
* solves missing secrets issue #45
* solves problem with nf-tests on GitHub actions #45
Other:
* massive docs update
* refactor `modules.config` and clean up published results
* WEBIN_ACCOUNT renamed to ENA_WEBIN, WEBIN_PASSWORD to ENA_WEBIN_PASSWORD
Copy file name to clipboardExpand all lines: CITATIONS.md
+34Lines changed: 34 additions & 0 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -14,6 +14,40 @@
14
14
15
15
> Ewels P, Magnusson M, Lundin S, Käller M. MultiQC: summarize analysis results for multiple tools and samples in a single report. Bioinformatics. 2016 Oct 1;32(19):3047-8. doi: 10.1093/bioinformatics/btw354. Epub 2016 Jun 16. PubMed PMID: 27312411; PubMed Central PMCID: PMC5039924.
> Chklovski A, Parks DH, Woodcroft BJ, Tyson GW. CheckM2: a rapid, scalable and accurate tool for assessing microbial genome quality using machine learning. Nat Methods. 2023;20(8):1203-1212. doi: 10.1038/s41592-023-01940-w. PubMed PMID: 37500759; PubMed Central PMCID: not available.
24
+
25
+
-[CAT and BAT](https://doi.org/10.1186/s13059-019-1817-x)
26
+
27
+
> von Meijenfeldt FAB, Arkhipova K, Cambuy DD, Coutinho FH, Dutilh BE. Robust taxonomic classification of uncharted microbial sequences and bins with CAT and BAT. Genome Biol. 2019;20(1):217. doi: 10.1186/s13059-019-1817-x. PubMed PMID: 31640809; PubMed Central PMCID: PMC6805573.
Copy file name to clipboardExpand all lines: README.md
+30-14Lines changed: 30 additions & 14 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -38,9 +38,9 @@ Currently, the pipeline supports three submission modes, each routed to a dedica
38
38
39
39
Setup your environment secrets before running the pipeline:
40
40
41
-
`nextflow secrets set WEBIN_ACCOUNT "Webin-XXX"`
41
+
`nextflow secrets set ENA_WEBIN "Webin-XXX"`
42
42
43
-
`nextflow secrets set WEBIN_PASSWORD "XXX"`
43
+
`nextflow secrets set ENA_WEBIN_PASSWORD "XXX"`
44
44
45
45
Make sure you update commands above with your authorised credentials.
46
46
@@ -55,43 +55,52 @@ The input must follow `assets/schema_input_genome.json`.
55
55
Required columns:
56
56
57
57
-`sample`
58
-
-`fasta` (must end with `.fa.gz`or `.fasta.gz`)
58
+
-`fasta` (must end with `.fa.gz`, `.fasta.gz`, or `.fna.gz`)
59
59
-`accession`
60
60
-`assembly_software`
61
61
-`binning_software`
62
62
-`binning_parameters`
63
-
-`stats_generation_software`
64
63
-`metagenome`
65
64
-`environmental_medium`
66
65
-`broad_environment`
67
66
-`local_environment`
68
67
-`co-assembly`
69
68
70
-
Columns that required for now, but will be optional in the nearest future:
69
+
At least one of the following must be provided per row:
71
70
71
+
- reads (`fastq_1`, optional `fastq_2` for paired-end)
72
+
-`genome_coverage`
73
+
74
+
Additional supported columns:
75
+
76
+
-`stats_generation_software`
72
77
-`completeness`
73
78
-`contamination`
74
-
-`genome_coverage`
75
79
-`RNA_presence`
76
80
-`NCBI_lineage`
77
81
78
-
Those fields are metadata required for [genome_uploader](https://github.com/EBI-Metagenomics/genome_uploader) package.
82
+
If `genome_coverage`, `stats_generation_software`, `completeness`, `contamination`, `RNA_presence`, or `NCBI_lineage` are missing, the workflow can calculate or infer them when the required inputs are available.
83
+
84
+
Those fields are metadata required for the [genome_uploader](https://github.com/EBI-Metagenomics/genome_uploader) package.
> **Samplesheet column requirements**: All columns shown in the example above must be present in your samplesheet, even if some values are empty. Columns must be in exactly the same order as shown.
> **Samplesheet column requirements**: All columns shown in the example above must be present in your samplesheet, even if some values are empty. Columns must be in exactly the same order as shown.
125
+
114
126
## Usage
115
127
116
128
> [!NOTE]
@@ -122,6 +134,10 @@ All data submitted through this pipeline must be associated with an ENA study (p
122
134
123
135
See the [usage documentation](docs/usage.md#submission-study) for more details.
124
136
137
+
### Database setup (`CheckM2` and `CAT_pack`)
138
+
139
+
The `mags`/`bins` workflow requires databases for completeness/contamination estimation and taxonomy assignment. See [Usage documentation](usage.md) for details.
140
+
125
141
### Required parameters:
126
142
127
143
| Parameter | Description |
@@ -137,7 +153,7 @@ See the [usage documentation](docs/usage.md#submission-study) for more details.
0 commit comments