Skip to content

Add MAGs-visualization Galaxy wrappers#7922

Open
alexandrah1704 wants to merge 6 commits into
galaxyproject:mainfrom
alexandrah1704:add-mags-visualization-subcommands
Open

Add MAGs-visualization Galaxy wrappers#7922
alexandrah1704 wants to merge 6 commits into
galaxyproject:mainfrom
alexandrah1704:add-mags-visualization-subcommands

Conversation

@alexandrah1704
Copy link
Copy Markdown

This PR adds Galaxy wrappers for MAGs-visualization, a toolkit for generating plots and dashboards for MAG evaluation results (CheckM, CheckM2, dRep, GTDB, QUAST, Bakta, CoverM, metadata, KEGG).

This PR supersedes my previous submission (#7583).

The previous single-wrapper implementation has been refactored into separate, modular wrappers for each subcommand.

The following tools are included:

  • comp-conta
  • sample-heatmap
  • drep-cluster-annot
  • drep-cluster-func
  • pathway-module-heatmap

It supports multiple visualization types (quality metrics, taxonomy, clustering, abundance, functional annotation).
Outputs returned as dataset collections.
Links to example outputs included in help sections.

The wrappers use the mags-visualizationCLI distributed via Bioconda:
https://github.com/bioconda/bioconda-recipes/tree/master/recipes/mags-visualization

Source code:
https://github.com/usegalaxy-eu/MAGs-visualization

Tested locally with planemo test.
All local builds and tests pass.

Example outputs can be found in the MAGs-visualization use cases:
https://github.com/usegalaxy-eu/MAGs-visualization/tree/main/use-cases

FOR CONTRIBUTOR:

  • I have read the CONTRIBUTING.md document and this tool is appropriate for the tools-iuc repo.
  • License permits unrestricted use (educational + commercial)
  • This PR adds a new tool or tool collection
  • This PR updates an existing tool or tool collection
  • This PR does something else (explain below)

@alexandrah1704 alexandrah1704 mentioned this pull request Apr 26, 2026
5 tasks
@alexandrah1704
Copy link
Copy Markdown
Author

Adjusted pathway test dataset size to ensure meaningful heatmap visualization while staying within IUC file size limits.

Copy link
Copy Markdown
Contributor

@SaimMomin12 SaimMomin12 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @alexandrah1704

Some preliminary comments inline

Comment thread tools/mags_visualization/comp_conta.xml Outdated
@@ -0,0 +1,167 @@
<tool id="mags_visualization_comp_conta" name="MAGs-visualization comp-conta" version="@TOOL_VERSION@+galaxy@VERSION_SUFFIX@" profile="22.05">
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
<tool id="mags_visualization_comp_conta" name="MAGs-visualization comp-conta" version="@TOOL_VERSION@+galaxy@VERSION_SUFFIX@" profile="22.05">
<tool id="mags_visualization_comp_conta" name="MAGs-visualization comp-conta" version="@TOOL_VERSION@+galaxy@VERSION_SUFFIX@" profile="@PROFILE@">

Comment thread tools/mags_visualization/comp_conta.xml Outdated
Comment on lines +38 to +39
<param name="checkm" type="data" format="tabular,tsv" label="CheckM results" />
<param name="checkm2" type="data" format="tabular,tsv" label="CheckM2 results" />
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we prefer either tabular or tsv here?

Comment thread tools/mags_visualization/comp_conta.xml Outdated
</outputs>

<tests>
<test>
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
<test>
<test expect_num_outputs="1">

Please add this all tests

Comment thread tools/mags_visualization/comp_conta.xml Outdated
<param name="checkm" type="data" format="tabular,tsv" label="CheckM results" />
<param name="checkm2" type="data" format="tabular,tsv" label="CheckM2 results" />

<conditional name="mode">
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would prefer different conditional name as its similar to the below param name

Comment thread tools/mags_visualization/comp_conta.xml Outdated
Comment on lines +148 to +150
<help><![CDATA[
**MAGs-visualization: comp-conta**

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A more detailed help would be nice to have

</param>

<param name="max_col" type="integer" value="10" min="1" label="Maximum number of taxonomy labels" />
<param name="no_log" type="boolean" checked="false" truevalue="true" falsevalue="" label="Disable log10 scaling for the top bar plot" />
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
<param name="no_log" type="boolean" checked="false" truevalue="true" falsevalue="" label="Disable log10 scaling for the top bar plot" />
<param name="no_log" type="boolean" checked="false" truevalue="--no_log" falsevalue="" label="Disable log10 scaling for the top bar plot" />

Comment on lines +33 to +35
#if $no_log
--no_log
#end if
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
#if $no_log
--no_log
#end if
$no_log

@@ -0,0 +1,80 @@
<tool id="mags_visualization_taxa_sankey" name="MAGs-visualization taxa-sankey" version="@TOOL_VERSION@+galaxy@VERSION_SUFFIX@" profile="22.05">
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
<tool id="mags_visualization_taxa_sankey" name="MAGs-visualization taxa-sankey" version="@TOOL_VERSION@+galaxy@VERSION_SUFFIX@" profile="22.05">
<tool id="mags_visualization_taxa_sankey" name="MAGs-visualization taxa-sankey" version="@TOOL_VERSION@+galaxy@VERSION_SUFFIX@" profile="@PROFILE@">

Comment thread tools/mags_visualization/.shed.yml Outdated
Comment thread tools/mags_visualization/.shed.yml
@alexandrah1704
Copy link
Copy Markdown
Author

Thank you for the detailed review.

I have applied all comments:

  • updated XML formatting using planemo format
  • used profile tokens
  • improved help sections with clearer descriptions
  • $no_log
  • renamed conditional mode into mode_cond
  • added expect_num_outputs
  • just used tabular

Please let me know if anything else should be adjusted.

Copy link
Copy Markdown
Contributor

@bernt-matthias bernt-matthias left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Great start. Some more comments from my side. Some apply to multiple/all tools.

Comment thread tools/mags_visualization/comp_conta.xml Outdated
Comment thread tools/mags_visualization/comp_conta.xml Outdated
<param name="meta_bin_width" type="float" value="5.0" label="Bin width for numeric metadata"/>
</when>
</conditional>
<param name="top_n" type="integer" value="30" min="1" label="Top N categories"/>
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Categories of what?

Comment thread tools/mags_visualization/comp_conta.xml Outdated
</section>
</inputs>
<outputs>
<collection name="plots" type="list" label="${tool.name} outputs">
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If there is only one output you should stick to the default label. Otherwise use the prefix ${tool.name} on ${on_string}. The suffix outputs is definitely not needed - clearly its outputs :)

Comment thread tools/mags_visualization/comp_conta.xml Outdated
<param name="checkm2" value="checkm2.tsv"/>
<param name="format" value="png"/>
<output_collection name="plots" type="list">
<element name="comp_conta_marginals_checkm">
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please add ftype to the outputs.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For images (png) we have assertions on width and height that could be used here.

Comment thread tools/mags_visualization/comp_conta.xml Outdated
<param name="mode_cond|mode" value="tax"/>
<param name="mode_cond|gtdb" value="gtdb.tsv"/>
<param name="tax_level" value="phylum"/>
<param name="format" value="png"/>
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you adapt tests 2 and 3 to test for pdf and svg output?

Comment thread tools/mags_visualization/comp_conta.xml Outdated
</when>
<when value="meta">
<param name="metadata" type="data" format="tabular" label="Metadata table"/>
<param name="meta_col" type="text" label="Metadata column name"/>
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe a data_column https://docs.galaxyproject.org/en/master/dev/schema.html#data-column parameter would be an option?

Comment thread tools/mags_visualization/comp_conta.xml Outdated
<when value="meta">
<param name="metadata" type="data" format="tabular" label="Metadata table"/>
<param name="meta_col" type="text" label="Metadata column name"/>
<param name="meta_bin_width" type="float" value="5.0" label="Bin width for numeric metadata"/>
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Add min and/or max to numeric parameters if possible.

Comment on lines +46 to +56
<repeat name="tax_levels" title="Taxonomy levels" min="2">
<param name="level" type="select" label="Taxonomic level">
<option value="domain">domain</option>
<option value="phylum" selected="true">phylum</option>
<option value="class">class</option>
<option value="order">order</option>
<option value="family">family</option>
<option value="genus" selected="true">genus</option>
<option value="species">species</option>
</param>
</repeat>
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Replace by a select with multiple="true"?

</outputs>
<tests>
<test expect_num_outputs="1">
<param name="drep" value="drep.csv"/>
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you also test with a tabular input?

Comment on lines +46 to +48
<repeat name="meta_cols" title="Metadata columns" min="0">
<param name="col" type="text" label="Metadata column name"/>
</repeat>
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

data_column with multiple="true"?

@bgruening
Copy link
Copy Markdown
Member

There is a new version available

@alexandrah1704
Copy link
Copy Markdown
Author

Thank you for the review.
I have now implemented the requested changes.
Please let me know if any further adjustements are needed.

@bgruening
Copy link
Copy Markdown
Member

Hi @alexandrah1704!

Thanks a lot, a few comments:

  • The command block has a copy-paste error: the #if $mode_cond.mode == "tax" conditional appears twice. The second block re-emits --gtdb without --tax_level, which will pass the flag twice and may cause CLI errors.
#if $mode_cond.mode == "tax"
    --gtdb '$mode_cond.gtdb'
    --tax_level '$mode_cond.tax_level'
#end if

#if $mode_cond.mode == "tax"   <!-- DUPLICATE — remove this block? -->
    --gtdb '$mode_cond.gtdb'
#end if
  • The --tax_levels loop is emitted after the -o outputs argument, which produces a malformed command. If tax_levels is an empty list this also creates a dangling --tax_levels with no values.

  • Using len() on a Cheetah repeat variable to guard meta_cols is unreliable in Galaxy templates. The current pattern:

#if $metadata_cond.metadata and len($metadata_cond.meta_cols) > 0

Fix: Use the standard idiom:

#if $metadata_cond.metadata
    --metadata '$metadata_cond.metadata'
    #if $metadata_cond.meta_cols
        --meta_cols
        #for $mc in $metadata_cond.meta_cols
            '$mc.col'
        #end for
    #end if
#end if
  • The output collection pattern matches html|png|pdf|svg but there is no --format flag is this working or is the tool using a standard format?

  • consider adding a argument= attribute on <param> elements

  • If a DOI or Zenodo record exists, prefer type="doi" over a raw BibTeX entry

  • Most tests only assert a minimum, please use stronger assertions and better tests.

<assert_contents>
    <has_size min="1"/>
</assert_contents>

e.g.

<has_image_width min="100"/>
<has_image_height min="100"/>

For SVG outputs, has_text on expected axis labels would be appropriate. For PDFs, assert a realistic minimum file size.

  • The first test case specifies tax_levels as two separate <param> elements:
<param name="tax_levels" value="phylum"/>
<param name="tax_levels" value="genus"/>

You can have multiple selections, correct? This:

<param name="tax_levels" value="phylum,genus"/>
  • Only one test case exists (rank=phylum). At least one additional rank (e.g., genus) should be tested.

  • Free-text meta_col / meta_cols parameters in comp_conta.xml and sample_heatmap.xml are passed directly to the CLI:

<param name="meta_col" type="text" label="Metadata column name"/>

Galaxy's single-quoting prevents shell injection, but column names containing commas or semicolons could affect parsing inside the Python tool. Please add a <sanitizer> block, or switch to data_column type to restrict input to valid column names from the dataset.

Copy link
Copy Markdown
Contributor

@paulzierep paulzierep left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

need to continue later

Comment thread tools/mags_visualization/comp_conta.xml Outdated
</section>
</inputs>
<outputs>
<collection name="plots" type="list">
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

you do not need collection output here, since each tool produces only one output afaik, since file name are variable you could do

cp outputs/${dynamic_name}.png outputs/final_plot.png

on the command line and then

<data name="plot" format="png" from_work_dir="outputs/final_plot.png"/>

Please adapt for all tools.

<option value="variance">variance (differential modules)</option>
<option value="both">both</option>
</param>
<param argument="--format" type="select" label="Output format">
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

you have format and fig size for all tools, you could move that to the macros and then add unit to the help, also add min, max please

Comment thread tools/mags_visualization/taxa_sankey.xml Outdated
Comment thread tools/mags_visualization/comp_conta.xml Outdated
Comment thread tools/mags_visualization/comp_conta.xml Outdated
label="Metadata column"
use_header_names="true"
help="Select the column to color by"/>
<param name="meta_bin_width" type="float" value="5.0" min="0.1" label="Bin width for numeric metadata"/>
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

needs more explanation


This tool visualizes genome clusters generated by dRep and annotates them using GTDB taxonomy.

Clusters are grouped by taxonomic levels (e.g. phylum, genus) based on representative genomes.
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

are they grouped ?

</param>
<section name="representatives" title="Representative MAG selection (optional)" expanded="false">
<param name="drep" type="data" format="csv,tabular" optional="true" label="dRep cluster table"/>
<param name="gtdb" type="data" format="tabular" optional="true" label="GTDB annotation table"/>
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why is gtdb needed for selection ?

Copy link
Copy Markdown
Author

@alexandrah1704 alexandrah1704 May 20, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

gtdb is needed to select the same representative genomes as in the drep-cluster-func plot. Without gtdb, there will be a different row order and different MAGs being shown. With gtdb both plots show identical MAGs in the same order, to make it directly comparable, so there is no confusion.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You could add a bit of help, eg via the help attribute what these inputs are used for / in which cases the user should supply them.

@alexandrah1704
Copy link
Copy Markdown
Author

I've adjusted the following:

  • Removed duplicate
  • used standard idom for metadata_cond.metadata
  • there is now a format flag in taxa_sankey, which will work for all outputs (png, pdf svg) with the new version v0.0.9, once its merged in bioconda.
  • added argument in
  • adjusted tests
  • data_column is now used for metadata
  • removed collection
  • moved fig_size and format to macros
  • checkm, checkm2 can now be plotted separately

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants