Add Stanza NLP tool and data manager by ksuderman · Pull Request #8004 · galaxyproject/tools-iuc

ksuderman · 2026-05-20T01:01:01Z

Summary

Adds Stanford Stanza NLP tool supporting 80+ languages
Includes data manager for downloading Stanza language models from HuggingFace
Provides tokenization, POS tagging, lemmatization, dependency parsing, NER
Supports sentiment analysis and constituency parsing for select languages
Multiple output formats: JSON, CoNLL-U, tabular, text

Stanza Tool Features

Memory-optimized nocharlm models for better performance
Comprehensive language support with standardized Universal Dependencies
Docker containerization for consistent execution environment
Integration with Galaxy data tables for model selection

Data Manager Features

Downloads models directly from HuggingFace using default_fast package
Multi-select installation interface for language packages
Automatic registration in Galaxy data tables
Duplicate prevention when re-run

Test plan

Tool passes planemo lint validation
Data manager passes planemo lint validation
Comprehensive test data included
README documentation provided for both components
.shed.yml configured for IUC submission

🤖 Generated with Claude Code

- Stanza neural NLP toolkit supporting 80+ languages - State-of-the-art accuracy with Universal Dependencies v2.12 - Complete annotation pipeline: tokenization, POS, NER, parsing, sentiment, constituency - CPU-optimized PyTorch models with default_fast configuration - Docker containerization for consistent execution - Data manager with direct HuggingFace downloads (no stanza dependency) - Memory efficient nocharlm models for container deployment - Comprehensive language coverage including major world languages - Comprehensive tests and documentation Tool: stanza_nlp (v1.11.1+galaxy4) Data Manager: data_manager_stanza_models (v1.11.1.3) Categories: Text Manipulation, Natural Language Processing

## Stanza NLP Tool - Stanford Stanza NLP annotation tool supporting 80+ languages - Provides tokenization, POS tagging, lemmatization, dependency parsing, NER - Supports sentiment analysis and constituency parsing for select languages - Multiple output formats: JSON, CoNLL-U, tabular, text ## Data Manager - Downloads and installs Stanza language models from HuggingFace - Uses nocharlm models optimized for memory efficiency - Supports multi-select installation of language packages - Integrates with Galaxy data tables for model selection Co-Authored-By: Claude Sonnet 4 <noreply@anthropic.com>

RZ9082

This one could also use a cleanup, please remove all duplicate files

- Remove nested galaxy_tools_stanza/ directory from tools/stanza/ - Remove data_manager_stanza/ subdirectory from data manager - Clean up generated test output files

Co-Authored-By: Claude Sonnet 4 <noreply@anthropic.com>

bgruening · 2026-05-20T22:05:00Z

+<tool id="stanza_nlp" name="Stanza NLP Annotators" version="@TOOL_VERSION@+galaxy@VERSION_SUFFIX@" profile="24.1">
+    <macros>
+        <token name="@TOOL_VERSION@">1.11.1</token>
+        <token name="@VERSION_SUFFIX@">4</token>


Your Agent is trying 4 times, but the tool is not released, so this should stay 0 :)

bgruening · 2026-05-20T22:06:22Z

+python -c "import stanza; print(stanza.__version__)"
+    ]]></version_command>
+    <command detect_errors="exit_code"><![CDATA[
+    export HOME=\${TMPDIR:-/tmp} &&


why is this needed? Galaxy provides a HOME dir for every job (if you specify a profile.)

bgruening · 2026-05-20T22:07:44Z

+  year={2020},
+  url={https://stanfordnlp.github.io/stanza/}
+}
+        </citation>


Suggested change

</citation>

<citation type="doi">10.18653/v1/2020.acl-demos.14

This paper has a DOI.

bgruening · 2026-05-20T22:08:27Z

+            <param name="language_model" value="en"/>
+            <param name="annotators" value="tokenize"/>
+            <param name="format" value="json"/>
+            <output name="outputFile">


Suggested change

<output name="outputFile">

<output name="outputFile" ftype="json">

And then use the more stricter json asserts please

bgruening · 2026-05-20T22:08:50Z

+https://stanfordnlp.github.io/stanza/available_models.html
+    ]]></help>
+    <citations>
+        <citation type="bibtex">


use the DOI

bgruening · 2026-05-20T22:09:35Z

this file is not needed, or something is off here, because of the hardcoded path

bgruening · 2026-05-20T22:13:51Z

+        <token name="@VERSION_SUFFIX@">4</token>
+    </macros>
+    <requirements>
+        <container type="docker">ksuderman/stanza-nlp:@TOOL_VERSION@</container>


There is this conda package available: https://anaconda.org/channels/conda-forge/packages/stanza/files

Can you try if:

Suggested change

<container type="docker">ksuderman/stanza-nlp:@TOOL_VERSION@</container>

<requirement type="package" version="@TOOL_VERSION@">stanza</requirement>

works? And there is a 1.12.0 version available.

ksuderman and others added 3 commits May 19, 2026 19:14

RZ9082 reviewed May 20, 2026

View reviewed changes

ksuderman and others added 3 commits May 20, 2026 12:49

Remove duplicate directories and test outputs

8758ca4

- Remove nested galaxy_tools_stanza/ directory from tools/stanza/ - Remove data_manager_stanza/ subdirectory from data manager - Clean up generated test output files

Addressed review comments

cc8a919

Co-Authored-By: Claude Sonnet 4 <noreply@anthropic.com>

Fixed macro inlining for Stanza tool

d92a4ee

Co-Authored-By: Claude Sonnet 4 <noreply@anthropic.com>

bgruening reviewed May 20, 2026

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add Stanza NLP tool and data manager#8004

Add Stanza NLP tool and data manager#8004
ksuderman wants to merge 6 commits into
galaxyproject:mainfrom
ksuderman:stanza-nlp

ksuderman commented May 20, 2026

Uh oh!

RZ9082 left a comment

Uh oh!

bgruening May 20, 2026

Uh oh!

bgruening May 20, 2026

Uh oh!

bgruening May 20, 2026

Uh oh!

bgruening May 20, 2026

Uh oh!

bgruening May 20, 2026

Uh oh!

bgruening May 20, 2026 •

edited

Loading

Uh oh!

bgruening May 20, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

	</citation>
	<citation type="doi">10.18653/v1/2020.acl-demos.14

	<output name="outputFile">
	<output name="outputFile" ftype="json">

	<container type="docker">ksuderman/stanza-nlp:@TOOL_VERSION@</container>
	<requirement type="package" version="@TOOL_VERSION@">stanza</requirement>

Conversation

ksuderman commented May 20, 2026

Summary

Stanza Tool Features

Data Manager Features

Test plan

Uh oh!

RZ9082 left a comment

Choose a reason for hiding this comment

Uh oh!

bgruening May 20, 2026

Choose a reason for hiding this comment

Uh oh!

bgruening May 20, 2026

Choose a reason for hiding this comment

Uh oh!

bgruening May 20, 2026

Choose a reason for hiding this comment

Uh oh!

bgruening May 20, 2026

Choose a reason for hiding this comment

Uh oh!

bgruening May 20, 2026

Choose a reason for hiding this comment

Uh oh!

bgruening May 20, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

bgruening May 20, 2026

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

bgruening May 20, 2026 •

edited

Loading