Add proseg tool v3.1.1 for high-speed cell segmentation#7969
Add proseg tool v3.1.1 for high-speed cell segmentation#7969Biochem-50 wants to merge 8 commits into
Conversation
SaimMomin12
left a comment
There was a problem hiding this comment.
IUC Review: tools/proseg/proseg.xml
Thanks for contributing this tool! A few changes are needed before this is ready to merge.
Blocking
Missing .shed.yml
Every tool directory requires a .shed.yml for ToolShed submission. This file is absent from tools/proseg/.
README.md should not be committed into the tool directory
Development notes belong in the PR description or commit messages, not as a tracked file in the tool directory.
Required Changes
Hardcoded --cell-id-column and --cell-id-unassigned
These CLI arguments are silently fixed for all users. If a dataset uses a different column name or sentinel value for unassigned transcripts, the tool will silently use wrong values. Both should be exposed as optional input parameters with their current values as defaults.
Help text is too sparse and contains a hardcoded version number
Embedding 3.1.1 in the help text means it goes stale on every update. Help text should use RST format with a proper description of inputs, outputs, and a link to the upstream documentation.
Missing bio.tools xref
A <xref type="bio.tools">proseg</xref> entry should be added for EDAM/OpenEBench integration, which is now standard for IUC tools.
Profile version is outdated
profile="21.05" should be updated to at least profile="25.0".
Minor
Version string is duplicated
The version 3.1.1 appears in both the version attribute and the <requirement> tag. Using inline @TOOL_VERSION@ and @VERSION_SUFFIX@ tokens (defined in a <macros> block) keeps these in sync and makes future updates easier.
Text parameters lack validators
The column name parameters (gene_col, x_col, etc.) accept arbitrary text and pass it directly to the CLI. Regex validators should be added to restrict input to valid column name characters.
Test is missing expect_num_outputs
The <test> element should declare expect_num_outputs="1" to make output count assertions explicit.
|
I have addressed all the feedback: Created .shed.yml and removed README.md. Updated profile to 25.0 and added bio.tools xref. Implemented macros for versioning and added regex validators to text parameters. Exposed the previously hardcoded cell ID parameters. Added expect_num_outputs="1" to the test. Registered the tool in .tt_index to fix the linting error. Ready for re-review! Thanks again for the helpful guidance. |
I think some files were missed while committing. |
nilchia
left a comment
There was a problem hiding this comment.
Thanks a lot @Biochem-50.
I haven't checked the tool itself completely, but I think there are still some parameters that could be included in the wrapper.
For example, the following options are already mentioned in the Proseg repo but are currently not included in the wrapper:
Some simplifications were made to the model and sampling procedure. Now the sampling schedule is controlled with these four arguments: --burnin-samples, --samples giving the number of iterations, and --burnin-voxel-size and --voxel-size giving the x/y size of the voxels in microns. The burn in voxel size must be an integer multiple of the final voxel size.
The --nbglayers arguments has been removed. There is now just one --voxel-layers argument controlling how many voxels are stacked on the z-axis.
The voxel morphology prior has been changed. Instead of --perimeter-bound and --perimeter-eta, there is one --cell-compactness argument, where smaller numbers lead to more compact (equivalently, more circular) cells.
Please let me know if you need more help.
Best.
| @@ -0,0 +1,12 @@ | |||
| name: proseg | |||
| owner: iuc | |||
| description: High-speed cell segmentation for spatial transcriptomics | |||
There was a problem hiding this comment.
Is it really high-speed? Is it mentioned in the Proseg repo? I don't find it
| @@ -0,0 +1,54 @@ | |||
| <tool id="proseg" name="Proseg" version="3.1.1+galaxy0" profile="21.05"> | |||
There was a problem hiding this comment.
| <tool id="proseg" name="Proseg" version="3.1.1+galaxy0" profile="21.05"> | |
| <tool id="proseg" name="Proseg" version="3.1.1+galaxy0" profile="25.1"> |
There was a problem hiding this comment.
It would be nice to use a more updated version of Galaxy.
| echo "Segmentation complete. Results are in the Zarr directory." > '$segmentation_results' | ||
| ]]></command> | ||
| <inputs> | ||
| <param name="transcripts" type="data" format="csv" label="Transcript coordinates"/> |
There was a problem hiding this comment.
So the input is only a transcript file?
What about images?
This is written in the Proseg repo:
Proseg relies on prior (usually image-based) segmentation to determine the number and approximate location of cells. It doesn't introduce new cells, so if the prior segmentation missed many cells, Proseg is not able to correct for that error.
So at least it should get cell segmentation as input, or?
| <param name="x_col" type="text" value="x" label="X coordinate column" help="e.g., x"/> | ||
| <param name="y_col" type="text" value="y" label="Y coordinate column" help="e.g., y"/> | ||
| <param name="z_col" type="text" value="z" label="Z coordinate column" help="e.g., z"/> |
There was a problem hiding this comment.
instead of a text param, please use a data_column
| <param name="z_col" type="text" value="z" label="Z coordinate column" help="e.g., z"/> | ||
| </inputs> | ||
| <outputs> | ||
| <data name="segmentation_results" format="txt" label="${tool.name} on ${on_string}: Summary" /> |
There was a problem hiding this comment.
This is just a report. The actual output is a spatialdata zarr file.
| <test> | ||
| <param name="transcripts" value="test_transcripts.csv" ftype="csv"/> | ||
| <param name="gene_col" value="gene"/> | ||
| <param name="x_col" value="x"/> | ||
| <param name="y_col" value="y"/> | ||
| <param name="z_col" value="z"/> | ||
| <output name="segmentation_results"> | ||
| <assert_contents> | ||
| <has_text text="Segmentation complete" /> | ||
| </assert_contents> | ||
| </output> | ||
| </test> |
There was a problem hiding this comment.
Please provide a more intense test. IMHO it is not enough to test a tool.
| @@ -0,0 +1,2 @@ | |||
| tools/proseg | |||
|
Hi @nilchia, thank you for the thorough and specific review. I have addressed all seven points:
I have also added the missing parameters you mentioned: --burnin-samples, Please let me know if anything further is needed. |
| <data name="output_polygons_file" | ||
| format="parquet" | ||
| label="${tool.name} on ${on_string}: Cell boundary polygons"> | ||
| <data name="output_spatialdata" format="directory" label="${tool.name} on ${on_string}: Segmentation (SpatialData zarr)"/> |
There was a problem hiding this comment.
Galaxy does not support plain directories as outputs. There is also no format as directory. Please check out the vpt Galaxy tool for how to output a spatialdata object https://github.com/bgruening/galaxytools/blob/master/tools/vpt/vpt_segment.xml
There was a problem hiding this comment.
There is actually a directory datatype, but maybe even more appropriate would be zarr?
|
|
||
| **Citation** | ||
|
|
||
| Moses et al. (2023) doi:10.1101/2023.11.20.567950 |
There was a problem hiding this comment.
Did you use some AI agent to generate the tool? It seems the DOI is made up. I couldn't find this. The method published in 2025, not 2023.
Please carefully review the entire code and documentation, if you used some AI agent to generate the tool.
| ]]></help> | ||
|
|
||
| <citations> | ||
| <citation type="doi">10.1101/2023.11.20.567950</citation> |
| remote_repository_url: https://github.com/galaxyproject/tools-iuc/tree/master/tools/proseg | ||
| type: unrestricted | ||
| categories: | ||
| - Transcriptomics |
There was a problem hiding this comment.
"Spatial Omics" and "Single Cell" should also fit here. Please check the available categoriers here: https://github.com/galaxyproject/planemo/blob/master/planemo/shed/__init__.py
| @@ -0,0 +1,204 @@ | |||
| <tool id="proseg" name="Proseg" version="3.1.1+galaxy1" profile="25.1"> | |||
There was a problem hiding this comment.
please define and use @TOOL_VERSION@ and @VERSION_SUFFIX@ tokens here.
| <tool id="proseg" name="Proseg" version="3.1.1+galaxy1" profile="25.1"> | ||
| <description>probabilistic cell segmentation for spatial transcriptomics</description> | ||
| <requirements> | ||
| <requirement type="package" version="3.1.1">proseg</requirement> |
There was a problem hiding this comment.
please use @TOOL_VERSION@ token
|
Hi @pavanvidem and @nilchia, Thank you for the detailed review. I want to address the main issues directly. On the citation error, I used an AI assistant to help generate On the other review points:
I am running a full local planemo lint and test validation before Thank you for the guidance on the vpt tool reference — that was helpful. |
|
Hi @Biochem-50, you're welcome! Let us know if you have questions. Why did you change the package name from |
|
Hi @nilchia and @pavanvidem, Thank you for catching this. I mixed up the command name with the Conda package name. I will change it to rust-proseg to ensure it matches the Bioconda , while keeping the tool ID as proseg. |
FOR CONTRIBUTOR:
There are two labels that allow to ignore specific (false positive) tool linter errors:
skip-version-check: Use it if only a subset of the tools has been updated in a suite.skip-url-check: Use it if github CI sees 403 errors, but the URLs work.Description:
This PR introduces a new Galaxy wrapper for ProSeg (v3.1.1), a high-performance cell segmentation tool for spatial transcriptomics.
Key Features:
Integrates rust-proseg via Bioconda.
Optimized for the EISTA (European Infrastructure for Spatial Transcriptomics Analysis) workflow.
Supports multi-dimensional transcript data (X, Y, Z) and outputs results in Zarr format compatible with downstream SpatialData tools.
Testing:
planemo lint passed with no warnings.
planemo test passed successfully using provided sample data.
CC: @nilchia (Amirhossein)