From f156078799afb562a8cc5d43195bcd883901ac12 Mon Sep 17 00:00:00 2001
From: Keith Suderman <suderman@jhu.edu>
Date: Tue, 19 May 2026 19:14:14 -0400
Subject: [PATCH 1/6] Add Stanford Stanza NLP tool with data manager

- Stanza neural NLP toolkit supporting 80+ languages
- State-of-the-art accuracy with Universal Dependencies v2.12
- Complete annotation pipeline: tokenization, POS, NER, parsing, sentiment, constituency
- CPU-optimized PyTorch models with default_fast configuration
- Docker containerization for consistent execution
- Data manager with direct HuggingFace downloads (no stanza dependency)
- Memory efficient nocharlm models for container deployment
- Comprehensive language coverage including major world languages
- Comprehensive tests and documentation

Tool: stanza_nlp (v1.11.1+galaxy4)
Data Manager: data_manager_stanza_models (v1.11.1.3)
Categories: Text Manipulation, Natural Language Processing
---
 .../data_manager_stanza_models/.shed.yml      |  15 ++
 .../data_manager_stanza_models/README.md      | 150 +++++++++++
 .../data_manager_stanza_models.py             | 243 ++++++++++++++++++
 .../data_manager_stanza_models.xml            | 121 +++++++++
 tools/stanza/.shed.yml                        |  14 +
 tools/stanza/README.md                        | 145 +++++++++++
 tools/stanza/macros.xml                       |   4 +
 tools/stanza/stanza_nlp.xml                   | 192 ++++++++++++++
 tools/stanza/stanza_process.py                | 230 +++++++++++++++++
 tools/stanza/test-data/input.txt              |   2 +
 tools/stanza/test-data/stanza_models.loc      |   1 +
 .../stanza/tool-data/stanza_models.loc.sample |  10 +
 tools/stanza/tool_data_table_conf.xml.sample  |   6 +
 13 files changed, 1133 insertions(+)
 create mode 100644 data_managers/data_manager_stanza_models/.shed.yml
 create mode 100644 data_managers/data_manager_stanza_models/README.md
 create mode 100644 data_managers/data_manager_stanza_models/data_manager_stanza_models.py
 create mode 100644 data_managers/data_manager_stanza_models/data_manager_stanza_models.xml
 create mode 100644 tools/stanza/.shed.yml
 create mode 100644 tools/stanza/README.md
 create mode 100644 tools/stanza/macros.xml
 create mode 100644 tools/stanza/stanza_nlp.xml
 create mode 100644 tools/stanza/stanza_process.py
 create mode 100644 tools/stanza/test-data/input.txt
 create mode 100644 tools/stanza/test-data/stanza_models.loc
 create mode 100644 tools/stanza/tool-data/stanza_models.loc.sample
 create mode 100644 tools/stanza/tool_data_table_conf.xml.sample

diff --git a/data_managers/data_manager_stanza_models/.shed.yml b/data_managers/data_manager_stanza_models/.shed.yml
new file mode 100644
index 00000000000..b465da5eb11
--- /dev/null
+++ b/data_managers/data_manager_stanza_models/.shed.yml
@@ -0,0 +1,15 @@
+name: data_manager_stanza_models
+owner: iuc
+description: Data manager for downloading and installing Stanza language models
+long_description: |
+  This data manager allows Galaxy administrators to download and install Stanza
+  language models for use with the Stanza NLP annotation tool. It supports 80+
+  languages with models for tokenization, POS tagging, lemmatization, dependency
+  parsing, NER, sentiment analysis, and constituency parsing.
+homepage_url: https://stanfordnlp.github.io/stanza/
+remote_repository_url: https://github.com/ksuderman/galaxy_tools_stanza
+type: unrestricted
+categories:
+  - Data Managers
+  - Text Manipulation
+  - Natural Language Processing
diff --git a/data_managers/data_manager_stanza_models/README.md b/data_managers/data_manager_stanza_models/README.md
new file mode 100644
index 00000000000..e50e667758a
--- /dev/null
+++ b/data_managers/data_manager_stanza_models/README.md
@@ -0,0 +1,150 @@
+# Galaxy Data Manager for Stanza Language Models
+
+This Galaxy data manager downloads and installs Stanza language models for use with the Stanza NLP annotation tool, supporting 80+ languages with neural models trained on Universal Dependencies.
+
+## Features
+
+- **80+ languages**: Comprehensive language support for multilingual NLP
+- **Direct HuggingFace download**: Downloads models directly from HuggingFace without requiring stanza installation
+- **Multiple language installation**: Select and install multiple languages simultaneously
+- **Progress reporting**: Shows download progress for each language model
+- **Duplicate prevention**: Checks existing installations to avoid redundant downloads
+- **Data table integration**: Automatically registers models in Galaxy's data table system
+
+## How It Works
+
+This data manager:
+1. **Connects to HuggingFace**: Downloads default_fast model packages directly from Stanford's HuggingFace repository
+2. **No dependencies**: Uses only Python's `urllib.request` - no stanza installation required
+3. **Extracts models**: Unzips model packages to Galaxy's managed storage
+4. **Registers models**: Updates the `stanza_models.loc` data table for tool access
+5. **Version control**: Downloads models compatible with Stanza 1.11.1
+
+## Supported Languages
+
+The data manager supports **80+ languages** including:
+
+### European Languages
+- **Western**: English, German, French, Spanish, Italian, Portuguese, Dutch
+- **Nordic**: Swedish, Danish, Norwegian (Bokmål/Nynorsk), Finnish
+- **Slavic**: Russian, Ukrainian, Polish, Czech, Slovak, Croatian, Serbian, Bulgarian
+- **Other**: Greek, Hungarian, Romanian, Estonian, Latvian, Lithuanian
+
+### Asian Languages
+- **East Asian**: Chinese (Simplified/Traditional), Japanese, Korean
+- **South Asian**: Hindi, Tamil, Telugu, Marathi, Urdu
+- **Southeast Asian**: Vietnamese, Thai, Indonesian
+- **Middle Eastern**: Arabic, Persian, Hebrew, Turkish
+
+### Other Languages
+- **African**: Afrikaans
+- **Minority**: Basque, Galician, Catalan, Armenian, Georgian
+
+See [Stanza's complete model list](https://stanfordnlp.github.io/stanza/available_models.html) for detailed language coverage.
+
+## Model Details
+
+### Model Type
+- **default_fast**: Memory-efficient models without character-level processing
+- **Neural networks**: Pretrained on Universal Dependencies v2.12 treebanks
+- **Multi-task**: Single package includes tokenization, POS, lemma, parsing, and NER models (where available)
+
+### Model Sizes
+- **Typical size**: 50-200MB per language
+- **Variation**: Depends on language complexity and available training data
+- **Storage**: Models persist in Galaxy's `tool-data/stanza_models/` directory
+
+### Model Components
+Each language package may include:
+- **Tokenization**: Sentence and token segmentation
+- **POS tagging**: Universal POS tags and morphological features
+- **Lemmatization**: Base form reduction
+- **Dependency parsing**: Universal Dependencies syntax
+- **NER**: Named entity recognition (available for subset of languages)
+
+## Installation Process
+
+### Admin Setup
+1. **Install this data manager**: `data_manager_stanza_models`
+2. **Install the Stanza tool**: `stanza_nlp`
+3. **Navigate to Admin → Local Data**
+4. **Select "Stanza Language Models"**
+
+### Model Installation
+1. **Choose languages**: Select checkboxes for desired languages
+2. **Run installation**: Data manager will download and extract models
+3. **Monitor progress**: Download status shown for each language
+4. **Verify installation**: Models appear in the Stanza tool's language dropdown
+
+### Post-Installation
+- Models are immediately available to the Stanza NLP tool
+- No restart required
+- Models persist across Galaxy restarts
+- Multiple installations of the same language are prevented
+
+## Data Table Format
+
+Models are registered in `stanza_models.loc` with this format:
+```
+<lang_code>    <display_name>    <lang_code>    <models_path>
+```
+
+Example:
+```
+en    English    en    /galaxy/tool-data/stanza_models/en
+de    German     de    /galaxy/tool-data/stanza_models/de
+```
+
+## Technical Details
+
+### Download Source
+- **Repository**: https://huggingface.co/stanfordnlp/
+- **Model naming**: `stanza-{lang}` (e.g., `stanza-en`, `stanza-de`)
+- **Version**: Models tagged with `v{resources_version}` from Stanford's resources.json
+
+### Storage Structure
+```
+tool-data/
+└── stanza_models/
+    ├── en/
+    │   └── [English model files]
+    ├── de/
+    │   └── [German model files]
+    └── stanza_models.loc
+```
+
+### Dependencies
+- **Python 3.12**: Standard library only
+- **No stanza package**: Downloads directly from HuggingFace
+- **urllib.request**: For HTTP downloads
+- **zipfile**: For model extraction
+
+## Troubleshooting
+
+### Common Issues
+- **Network connectivity**: Ensure access to huggingface.co
+- **Disk space**: Large language sets require substantial storage
+- **Permissions**: Galaxy must have write access to tool-data directory
+
+### Model Verification
+- Check `stanza_models.loc` for registered models
+- Verify model files exist in expected directories
+- Test with Stanza NLP tool after installation
+
+## Citation
+
+This data manager installs models created by the Stanford NLP Group. Please cite:
+
+```
+Qi, Peng, Yuhao Zhang, Yuhui Zhang, Jason Bolton, and Christopher D. Manning. 
+"Stanza: A Python Natural Language Processing Toolkit for Many Human Languages." 
+In Proceedings of the 58th Annual Meeting of the Association for Computational 
+Linguistics: System Demonstrations, 2020.
+```
+
+## Version History
+
+- **1.11.1.3**: Enhanced duplicate prevention and error handling
+- **1.11.1.2**: Improved download progress reporting
+- **1.11.1.1**: Direct HuggingFace download implementation
+- **1.11.1.0**: Initial release
\ No newline at end of file
diff --git a/data_managers/data_manager_stanza_models/data_manager_stanza_models.py b/data_managers/data_manager_stanza_models/data_manager_stanza_models.py
new file mode 100644
index 00000000000..6ef5801a77b
--- /dev/null
+++ b/data_managers/data_manager_stanza_models/data_manager_stanza_models.py
@@ -0,0 +1,243 @@
+#!/usr/bin/env python
+"""
+Data Manager for Stanza Language Models
+
+Downloads Stanza language models from HuggingFace and registers them in
+Galaxy's data table. Does NOT require stanza to be installed — downloads
+the default model package (zip) directly via HTTP.
+"""
+
+import argparse
+import json
+import os
+import sys
+import urllib.request
+import zipfile
+from pathlib import Path
+
+
+# Stanza resource configuration
+STANZA_VERSION = "1.11.0"
+RESOURCES_URL = f"https://raw.githubusercontent.com/stanfordnlp/stanza-resources/main/resources_{STANZA_VERSION}.json"
+# URL template: filled with lang and resources_version from the resources JSON
+DEFAULT_URL_TEMPLATE = "https://huggingface.co/stanfordnlp/stanza-{lang}/resolve/v{resources_version}/models/{filename}"
+
+
+# Language display names
+STANZA_LANGUAGES = {
+    "en": "English",
+    "zh-hans": "Chinese (Simplified)",
+    "zh-hant": "Chinese (Traditional)",
+    "ar": "Arabic",
+    "fr": "French",
+    "de": "German",
+    "es": "Spanish",
+    "it": "Italian",
+    "pt": "Portuguese",
+    "nl": "Dutch",
+    "ru": "Russian",
+    "uk": "Ukrainian",
+    "pl": "Polish",
+    "ja": "Japanese",
+    "ko": "Korean",
+    "hi": "Hindi",
+    "tr": "Turkish",
+    "el": "Greek",
+    "hu": "Hungarian",
+    "sv": "Swedish",
+    "da": "Danish",
+    "nb": "Norwegian Bokmål",
+    "nn": "Norwegian Nynorsk",
+    "fi": "Finnish",
+    "ro": "Romanian",
+    "ca": "Catalan",
+    "cs": "Czech",
+    "sk": "Slovak",
+    "sl": "Slovenian",
+    "hr": "Croatian",
+    "sr": "Serbian",
+    "bg": "Bulgarian",
+    "lv": "Latvian",
+    "lt": "Lithuanian",
+    "et": "Estonian",
+    "he": "Hebrew",
+    "fa": "Persian",
+    "vi": "Vietnamese",
+    "th": "Thai",
+    "id": "Indonesian",
+    "af": "Afrikaans",
+    "eu": "Basque",
+    "gl": "Galician",
+    "hy": "Armenian",
+    "ka": "Georgian",
+    "ta": "Tamil",
+    "te": "Telugu",
+    "mr": "Marathi",
+    "ur": "Urdu",
+}
+
+
+def load_existing_models(data_table_path):
+    """Load existing model entries from the data table to avoid duplicates."""
+    existing = set()
+    if data_table_path and Path(data_table_path).exists():
+        with open(data_table_path) as f:
+            for line in f:
+                line = line.strip()
+                if line and not line.startswith('#'):
+                    parts = line.split('\t')
+                    if parts:
+                        existing.add(parts[0])
+    return existing
+
+
+def fetch_resources():
+    """Fetch the Stanza resources JSON to get download URLs and checksums."""
+    print(f"Fetching Stanza resources from {RESOURCES_URL}")
+    response = urllib.request.urlopen(RESOURCES_URL)
+    return json.loads(response.read())
+
+
+def download_model(lang, model_dir, resources):
+    """Download a Stanza language model package from HuggingFace.
+
+    Downloads the default.zip package for the language and extracts it
+    into the model_dir/<lang>/ directory. Also writes the resources.json
+    file needed by Stanza at runtime.
+    """
+    # Get the URL template from the resources JSON
+    url_template = resources.get("url", DEFAULT_URL_TEMPLATE)
+
+    # Check if the language exists in resources
+    if lang not in resources:
+        print(f"Error: Language '{lang}' not found in Stanza resources", file=sys.stderr)
+        return False
+
+    # Download the default_fast.zip package (nocharlm models — much lower memory usage)
+    # Fall back to default.zip if default_fast is not available for this language
+    packages = resources.get(lang, {}).get("packages", {})
+    package_name = "default_fast" if "default_fast" in packages else "default"
+    zip_url = url_template.format(
+        lang=lang,
+        resources_version=STANZA_VERSION,
+        filename=f"{package_name}.zip"
+    )
+    print(f"Using package: {package_name}")
+
+    lang_dir = Path(model_dir) / lang
+    lang_dir.mkdir(parents=True, exist_ok=True)
+    zip_path = lang_dir / "default.zip"
+
+    print(f"Downloading {zip_url}")
+    try:
+        urllib.request.urlretrieve(zip_url, str(zip_path))
+    except Exception as e:
+        print(f"Error downloading {lang} model: {e}", file=sys.stderr)
+        return False
+
+    # Extract the zip
+    print(f"Extracting to {lang_dir}")
+    try:
+        with zipfile.ZipFile(str(zip_path), 'r') as zf:
+            zf.extractall(str(lang_dir))
+    except Exception as e:
+        print(f"Error extracting {lang} model: {e}", file=sys.stderr)
+        return False
+
+    # Write resources.json if it doesn't exist yet (needed by stanza.Pipeline)
+    resources_path = Path(model_dir) / "resources.json"
+    if resources_path.exists():
+        with open(resources_path) as f:
+            existing_resources = json.load(f)
+    else:
+        existing_resources = {}
+
+    # Add/update this language's resource entry
+    existing_resources[lang] = resources[lang]
+    # Also include the URL key
+    existing_resources["url"] = url_template
+
+    with open(resources_path, 'w') as f:
+        json.dump(existing_resources, f, indent=2)
+
+    # Clean up the zip file
+    zip_path.unlink()
+
+    print(f"Successfully downloaded and extracted {lang} model")
+    return True
+
+
+def main():
+    parser = argparse.ArgumentParser(description="Download and register Stanza language models")
+    parser.add_argument("--model", action="append", required=True,
+                        help="Language code(s) to download (can be specified multiple times)")
+    parser.add_argument("--target-directory", required=True,
+                        help="Persistent directory to store downloaded models")
+    parser.add_argument("--output", required=True,
+                        help="JSON output file for Galaxy data manager")
+    parser.add_argument("--data-table", required=False,
+                        help="Path to existing data table file to check for duplicates")
+
+    args = parser.parse_args()
+
+    # Load existing models to avoid duplicates
+    existing_models = load_existing_models(args.data_table)
+
+    # Fetch resources JSON
+    try:
+        resources = fetch_resources()
+    except Exception as e:
+        print(f"Error fetching Stanza resources: {e}", file=sys.stderr)
+        sys.exit(1)
+
+    # Use the persistent target directory for models
+    model_dir = Path(args.target_directory)
+    model_dir.mkdir(parents=True, exist_ok=True)
+
+    data_table_entries = []
+
+    for lang in args.model:
+        if lang in existing_models:
+            print(f"\n{'=' * 60}")
+            print(f"Skipping {lang} - already in data table")
+            print(f"{'=' * 60}")
+            continue
+
+        print(f"\n{'=' * 60}")
+        print(f"Processing {lang}...")
+        print(f"{'=' * 60}")
+
+        display_name = STANZA_LANGUAGES.get(lang, lang)
+
+        if not download_model(lang, model_dir, resources):
+            print(f"WARNING: Failed to download {lang}", file=sys.stderr)
+            continue
+
+        data_table_entries.append({
+            "value": lang,
+            "name": display_name,
+            "lang": lang,
+            "models_path": str(model_dir),
+        })
+
+        print(f"Successfully registered {display_name}")
+        print(f"  Language code: {lang}")
+        print(f"  Models path: {model_dir}")
+
+    # Create data manager JSON output
+    data_manager_output = {
+        "data_tables": {
+            "stanza_models": data_table_entries
+        }
+    }
+
+    with open(args.output, "w") as f:
+        json.dump(data_manager_output, f, indent=2)
+
+    print(f"\n{'=' * 60}")
+    print(f"Summary: Successfully registered {len(data_table_entries)} model(s)")
+    print(f"{'=' * 60}")
+
+
+if __name__ == "__main__":
+    main()
diff --git a/data_managers/data_manager_stanza_models/data_manager_stanza_models.xml b/data_managers/data_manager_stanza_models/data_manager_stanza_models.xml
new file mode 100644
index 00000000000..74a1705c654
--- /dev/null
+++ b/data_managers/data_manager_stanza_models/data_manager_stanza_models.xml
@@ -0,0 +1,121 @@
+<tool id="data_manager_stanza_models" name="Stanza Language Models" version="1.11.1.3" tool_type="manage_data" profile="21.05">
+    <description>Download and install Stanza language models</description>
+    <requirements>
+        <requirement type="package" version="3.12">python</requirement>
+    </requirements>
+    <command detect_errors="exit_code"><![CDATA[
+        python '$__tool_directory__/data_manager_stanza_models.py'
+        #for $m in $models
+            --model '$m'
+        #end for
+        --target-directory '${__tool_data_path__}/stanza_models'
+        --output '$out_file'
+        --data-table '${__tool_data_path__}/stanza_models.loc'
+    ]]></command>
+    <inputs>
+        <param name="models" type="select" label="Stanza Language Models" multiple="true" display="checkboxes">
+            <option value="en" selected="true">English</option>
+            <option value="zh-hans">Chinese (Simplified)</option>
+            <option value="zh-hant">Chinese (Traditional)</option>
+            <option value="ar">Arabic</option>
+            <option value="fr">French</option>
+            <option value="de">German</option>
+            <option value="es">Spanish</option>
+            <option value="it">Italian</option>
+            <option value="pt">Portuguese</option>
+            <option value="nl">Dutch</option>
+            <option value="ru">Russian</option>
+            <option value="uk">Ukrainian</option>
+            <option value="pl">Polish</option>
+            <option value="ja">Japanese</option>
+            <option value="ko">Korean</option>
+            <option value="hi">Hindi</option>
+            <option value="tr">Turkish</option>
+            <option value="el">Greek</option>
+            <option value="hu">Hungarian</option>
+            <option value="sv">Swedish</option>
+            <option value="da">Danish</option>
+            <option value="nb">Norwegian Bokmål</option>
+            <option value="nn">Norwegian Nynorsk</option>
+            <option value="fi">Finnish</option>
+            <option value="ro">Romanian</option>
+            <option value="ca">Catalan</option>
+            <option value="cs">Czech</option>
+            <option value="sk">Slovak</option>
+            <option value="sl">Slovenian</option>
+            <option value="hr">Croatian</option>
+            <option value="sr">Serbian</option>
+            <option value="bg">Bulgarian</option>
+            <option value="lv">Latvian</option>
+            <option value="lt">Lithuanian</option>
+            <option value="et">Estonian</option>
+            <option value="he">Hebrew</option>
+            <option value="fa">Persian</option>
+            <option value="vi">Vietnamese</option>
+            <option value="th">Thai</option>
+            <option value="id">Indonesian</option>
+            <option value="af">Afrikaans</option>
+            <option value="eu">Basque</option>
+            <option value="gl">Galician</option>
+            <option value="hy">Armenian</option>
+            <option value="ka">Georgian</option>
+            <option value="ta">Tamil</option>
+            <option value="te">Telugu</option>
+            <option value="mr">Marathi</option>
+            <option value="ur">Urdu</option>
+        </param>
+    </inputs>
+    <outputs>
+        <data name="out_file" format="data_manager_json" />
+    </outputs>
+    <help><![CDATA[
+Stanza Language Models Data Manager
+====================================
+
+This data manager downloads and installs Stanza language models for use with
+the Stanza NLP annotation tool.
+
+How It Works
+------------
+
+Stanza models are downloaded using the ``stanza.download()`` API and stored
+in a Galaxy-managed directory. Each language downloads all available processors
+(tokenization, POS, lemma, depparse, NER, etc.) as a single package.
+
+Available Languages
+-------------------
+
+Stanza provides pretrained neural models for 80+ languages. This data manager
+offers a curated selection of the most commonly used languages. Models are
+trained on Universal Dependencies v2.12 treebanks.
+
+Usage
+-----
+
+1. Select one or more languages you want to install (checkboxes)
+2. Run the data manager
+3. Models will be downloaded and registered in the data table
+
+Model downloads vary in size. Most languages are 50-200MB. Downloading
+multiple languages will take longer accordingly.
+
+Version
+-------
+
+This data manager installs models compatible with Stanza version 1.11.1.
+
+For more information about Stanza models, see:
+https://stanfordnlp.github.io/stanza/available_models.html
+    ]]></help>
+    <citations>
+        <citation type="bibtex">
+@inproceedings{qi2020stanza,
+  title={Stanza: A {P}ython Natural Language Processing Toolkit for Many Human Languages},
+  author={Qi, Peng and Zhang, Yuhao and Zhang, Yuhui and Bolton, Jason and Manning, Christopher D.},
+  booktitle={Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics: System Demonstrations},
+  year={2020},
+  url={https://stanfordnlp.github.io/stanza/}
+}
+        </citation>
+    </citations>
+</tool>
diff --git a/tools/stanza/.shed.yml b/tools/stanza/.shed.yml
new file mode 100644
index 00000000000..797bdfc7507
--- /dev/null
+++ b/tools/stanza/.shed.yml
@@ -0,0 +1,14 @@
+name: stanza_nlp
+owner: iuc
+description: Stanza NLP Annotators
+long_description: |
+  This tool provides Stanford Stanza natural language processing annotation capabilities
+  for Galaxy. It supports 80+ languages with various annotation types including tokenization,
+  POS tagging, lemmatization, dependency parsing, named entity recognition, sentiment analysis,
+  and constituency parsing.
+homepage_url: https://stanfordnlp.github.io/stanza/
+remote_repository_url: https://github.com/ksuderman/galaxy_tools_stanza
+type: unrestricted
+categories:
+  - Text Manipulation
+  - Natural Language Processing
diff --git a/tools/stanza/README.md b/tools/stanza/README.md
new file mode 100644
index 00000000000..104be8e78df
--- /dev/null
+++ b/tools/stanza/README.md
@@ -0,0 +1,145 @@
+# Galaxy Wrapper for Stanford Stanza NLP
+
+This Galaxy tool provides access to Stanza, Stanford's neural natural language processing toolkit, supporting 80+ human languages with state-of-the-art accuracy for multilingual text analysis.
+
+## Features
+
+- **80+ languages**: Comprehensive multilingual support for diverse text corpora
+- **Neural models**: State-of-the-art accuracy with pretrained neural networks
+- **Multiple annotators**: Tokenization, POS tagging, NER, parsing, sentiment, and constituency parsing
+- **Universal Dependencies**: Standardized annotations following Universal Dependencies v2.12
+- **Multiple output formats**: JSON, CoNLL, CoNLL-U, and human-readable text
+- **Dockerized execution**: CPU-optimized PyTorch models in container environment
+- **Data manager integration**: Language models downloaded and managed separately
+
+## Requirements
+
+- **Data Manager**: Language models must be installed via the Stanza Language Models data manager
+- **Docker**: Uses the `ksuderman/stanza-nlp:1.11.1` Docker image with CPU-optimized PyTorch
+- **Memory**: Uses default_fast models for efficient memory usage in containers
+
+## Annotation Types
+
+| Annotator | Description | Output |
+|---|---|---|
+| **Tokenization** | Sentence segmentation and tokenization | Tokens with character offsets |
+| **Part of speech** | POS tags, lemmas, and morphological features | Universal POS (UPOS), treebank POS (XPOS), lemmas |
+| **Named entity recognition** | Person, organization, location, date entities | Entity spans with types (PERSON, ORG, GPE, etc.) |
+| **Dependency parsing** | Syntactic dependencies following Universal Dependencies | Head-child relationships with dependency labels |
+| **Sentiment analysis** | Per-sentence sentiment scoring | Sentiment scores (0=negative, 1=neutral, 2=positive) |
+| **Constituency parsing** | Phrase structure parse trees | Hierarchical syntactic structure |
+
+## Language Coverage
+
+Stanza supports **80+ languages** including:
+
+### Major Languages
+- **European**: English, Spanish, German, French, Italian, Portuguese, Dutch, Swedish, Danish, Norwegian, Greek, Polish, Russian, Ukrainian
+- **Asian**: Chinese, Japanese, Korean, Arabic, Hindi, Turkish  
+- **Others**: And many more languages with Universal Dependencies treebanks
+
+### NER Support
+Named entity recognition is available for a subset of languages including:
+- English, Chinese, Spanish, German, French, Dutch, Russian, Ukrainian
+
+See [Stanza's model documentation](https://stanfordnlp.github.io/stanza/available_models.html) for the complete supported language list.
+
+## Input Format
+
+- **Text files**: Plain text input in any supported language
+- **Encoding**: UTF-8 text encoding
+
+## Output Formats
+
+### JSON (Recommended)
+Comprehensive structured output with all annotations:
+```json
+{
+  "sentences": [
+    {
+      "tokens": [
+        {
+          "id": 1,
+          "text": "John",
+          "lemma": "John", 
+          "upos": "PROPN",
+          "head": 2,
+          "deprel": "nsubj"
+        }
+      ],
+      "entities": [
+        {
+          "text": "John Smith",
+          "type": "PERSON",
+          "start_char": 0,
+          "end_char": 10
+        }
+      ]
+    }
+  ]
+}
+```
+
+### CoNLL-U
+Universal Dependencies format with morphological features:
+```
+1	John	John	PROPN	_	_	2	nsubj	_	_
+2	works	work	VERB	_	_	0	root	_	_
+```
+
+### CoNLL
+Tab-separated format suitable for dependency parsing analysis.
+
+### Text
+Human-readable output with statistics and formatted annotations.
+
+## Model Architecture
+
+- **Neural networks**: Pretrained neural models for each language and task
+- **Universal Dependencies**: Consistent annotation standards across languages
+- **Default-fast models**: Memory-efficient nocharlm models optimized for containers
+- **CPU-optimized**: PyTorch models configured for CPU-only execution
+
+## Example Use Cases
+
+- **Multilingual corpus analysis**: Process text in 80+ languages with consistent annotations
+- **Cross-lingual studies**: Compare linguistic phenomena across different languages
+- **Historical linguistics**: Analyze texts in various languages and time periods
+- **Digital humanities**: Multi-language support for international document collections
+- **Dependency syntax**: Universal Dependencies parsing for computational linguistics
+
+## Installation
+
+1. Install the data manager: `data_manager_stanza_models`
+2. Install this tool: `stanza_nlp`
+3. Use the data manager to download language models:
+   - Go to **Admin → Local Data**
+   - Select "Stanza Language Models"
+   - Choose language(s) to install
+   - Models download directly from HuggingFace
+
+## Performance Notes
+
+- **Memory efficient**: Uses default_fast models without character-level modeling
+- **CPU-optimized**: PyTorch configured for CPU-only execution
+- **Container isolation**: Runs in Docker for consistent environment
+- **Model caching**: Downloaded models persist across runs
+
+## Citation
+
+If you use this tool, please cite:
+
+```
+Qi, Peng, Yuhao Zhang, Yuhui Zhang, Jason Bolton, and Christopher D. Manning. 
+"Stanza: A Python Natural Language Processing Toolkit for Many Human Languages." 
+In Proceedings of the 58th Annual Meeting of the Association for Computational 
+Linguistics: System Demonstrations, 2020.
+```
+
+## Version History
+
+- **1.11.1+galaxy4**: Latest release with enhanced output formatting and CPU optimization
+- **1.11.1+galaxy3**: Previous stable release
+- **1.11.1+galaxy2**: Early release
+- **1.11.1+galaxy1**: Beta release
+- **1.11.1+galaxy0**: Initial release
\ No newline at end of file
diff --git a/tools/stanza/macros.xml b/tools/stanza/macros.xml
new file mode 100644
index 00000000000..f58769bae21
--- /dev/null
+++ b/tools/stanza/macros.xml
@@ -0,0 +1,4 @@
+<macros>
+    <token name="@TOOL_VERSION@">1.11.1</token>
+    <token name="@VERSION_SUFFIX@">4</token>
+</macros>
diff --git a/tools/stanza/stanza_nlp.xml b/tools/stanza/stanza_nlp.xml
new file mode 100644
index 00000000000..6b9f84c18ad
--- /dev/null
+++ b/tools/stanza/stanza_nlp.xml
@@ -0,0 +1,192 @@
+<tool id="stanza_nlp" name="Stanza NLP Annotators" version="@TOOL_VERSION@+galaxy@VERSION_SUFFIX@" python_template_version="3.5" profile="21.05">
+    <macros>
+        <import>macros.xml</import>
+    </macros>
+    <requirements>
+        <container type="docker">ksuderman/stanza-nlp:@TOOL_VERSION@</container>
+    </requirements>
+    <version_command><![CDATA[
+python -c "import stanza; print(stanza.__version__)"
+    ]]></version_command>
+    <command detect_errors="exit_code"><![CDATA[
+    export HOME=\${TMPDIR:-/tmp} &&
+    python '$__tool_directory__/stanza_process.py'
+    --input '$input'
+    --output '$outputFile'
+    --lang '${language_model.fields.lang}'
+    --model-dir '${language_model.fields.models_path}'
+    --format '$format'
+    --annotators '$annotators'
+    ]]></command>
+    <inputs>
+        <param name="input" type="data" format="txt" label="Text"/>
+        <param name="language_model" type="select" label="Language Model">
+            <options from_data_table="stanza_models">
+                <column name="value" index="0"/>
+                <column name="name" index="1"/>
+                <column name="lang" index="2"/>
+                <column name="models_path" index="3"/>
+                <filter type="sort_by" column="1"/>
+            </options>
+        </param>
+        <param name="annotators" type="select" label="Annotation types">
+            <option value="tokenize" selected="true">Tokenization and sentence segmentation</option>
+            <option value="pos">Part of speech, lemmas, and morphological features</option>
+            <option value="ner">Named entity recognition</option>
+            <option value="parse">Dependency parsing</option>
+            <option value="sentiment">Sentiment analysis</option>
+            <option value="constituency">Constituency parsing</option>
+        </param>
+        <param name="format" type="select" label="Output format">
+            <option value="json" selected="true">JSON</option>
+            <option value="conll">CoNLL</option>
+            <option value="conllu">CoNLL-U</option>
+            <option value="text">Text</option>
+        </param>
+    </inputs>
+    <outputs>
+        <data name="outputFile" format="txt" label="${tool.name} (${annotators}) on ${on_string}">
+            <change_format>
+                <when input="format" value="json" format="json"/>
+                <when input="format" value="conllu" format="tabular"/>
+                <when input="format" value="conll" format="tabular"/>
+            </change_format>
+        </data>
+    </outputs>
+    <tests>
+        <test>
+            <param name="input" value="input.txt"/>
+            <param name="language_model" value="en"/>
+            <param name="annotators" value="tokenize"/>
+            <param name="format" value="json"/>
+            <output name="outputFile">
+                <assert_contents>
+                    <has_text text="&quot;text&quot;"/>
+                    <has_text text="&quot;tokens&quot;"/>
+                    <has_text text="John"/>
+                </assert_contents>
+            </output>
+        </test>
+        <test>
+            <param name="input" value="input.txt"/>
+            <param name="language_model" value="en"/>
+            <param name="annotators" value="ner"/>
+            <param name="format" value="json"/>
+            <output name="outputFile">
+                <assert_contents>
+                    <has_text text="&quot;entities&quot;"/>
+                    <has_text text="&quot;type&quot;"/>
+                </assert_contents>
+            </output>
+        </test>
+        <test>
+            <param name="input" value="input.txt"/>
+            <param name="language_model" value="en"/>
+            <param name="annotators" value="parse"/>
+            <param name="format" value="conllu"/>
+            <output name="outputFile">
+                <assert_contents>
+                    <has_text text="John"/>
+                    <has_text text="Smith"/>
+                </assert_contents>
+            </output>
+        </test>
+        <test>
+            <param name="input" value="input.txt"/>
+            <param name="language_model" value="en"/>
+            <param name="annotators" value="sentiment"/>
+            <param name="format" value="json"/>
+            <output name="outputFile">
+                <assert_contents>
+                    <has_text text="&quot;sentiment&quot;"/>
+                </assert_contents>
+            </output>
+        </test>
+    </tests>
+    <help><![CDATA[
+
+Stanza NLP
+==========
+
+Galaxy wrapper for the `Stanza <https://stanfordnlp.github.io/stanza/>`_ natural language
+processing toolkit from Stanford NLP Group. Stanza provides pretrained neural models
+supporting 80+ human languages.
+
+Annotation Types
+----------------
+
+Tokenization and sentence segmentation
+    Splits text into tokens and identifies sentence boundaries. Handles multi-word
+    token expansion for applicable languages.
+
+Part of speech, lemmas, and morphological features
+    Includes tokenization plus universal POS tagging (UPOS), treebank-specific POS
+    tagging (XPOS), lemmatization, and morphological feature analysis.
+
+Named entity recognition (NER)
+    Identifies named entities such as PERSON, ORG, GPE, DATE, etc. Available for
+    a subset of supported languages (8+ languages including English, Chinese, Spanish,
+    German, French, Dutch, Russian, and Ukrainian).
+
+Dependency parsing
+    Syntactic dependency parsing following Universal Dependencies annotation. Identifies
+    grammatical relationships (head and dependency relation) for each token.
+
+Sentiment analysis
+    Per-sentence sentiment scoring (0=negative, 1=neutral, 2=positive). Available for
+    languages with sentiment models.
+
+Constituency parsing
+    Phrase structure parse trees. Available for languages with constituency models.
+
+Output Formats
+--------------
+
+**JSON**
+    Comprehensive structured output with all annotations. Best for programmatic access.
+
+**CoNLL**
+    Tab-separated format suitable for dependency parsing tasks.
+
+**CoNLL-U**
+    Universal Dependencies format with morphological features.
+
+**Text**
+    Human-readable text output with statistics and annotations.
+
+Language Models
+---------------
+
+Stanza uses pretrained neural models organized by language. Models are downloaded and
+managed by the Stanza data manager. Each language may include models for different
+tasks (tokenization, POS, NER, etc.) trained on Universal Dependencies v2.12.
+
+Install models using the data manager (Admin > Local Data > Stanza Language Models).
+
+Supported Languages
+-------------------
+
+Stanza supports 80+ languages including:
+
+- English, Spanish, German, French, Italian, Portuguese
+- Chinese, Japanese, Korean
+- Arabic, Hindi, Turkish
+- Russian, Ukrainian, Polish
+- Dutch, Greek, Swedish, Danish, Norwegian
+- And many more...
+
+See https://stanfordnlp.github.io/stanza/available_models.html for the complete list.
+
+    ]]></help>
+    <citations>
+        <citation type="bibtex">
+@inproceedings{qi2020stanza,
+  title={Stanza: A {P}ython Natural Language Processing Toolkit for Many Human Languages},
+  author={Qi, Peng and Zhang, Yuhao and Zhang, Yuhui and Bolton, Jason and Manning, Christopher D.},
+  booktitle={Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics: System Demonstrations},
+  year={2020},
+  url={https://stanfordnlp.github.io/stanza/}
+}
+        </citation>
+    </citations>
+</tool>
diff --git a/tools/stanza/stanza_process.py b/tools/stanza/stanza_process.py
new file mode 100644
index 00000000000..738a79694b2
--- /dev/null
+++ b/tools/stanza/stanza_process.py
@@ -0,0 +1,230 @@
+#!/usr/bin/env python
+"""
+Stanza NLP Processing Script for Galaxy
+
+Processes text files with Stanza and outputs results in various formats.
+"""
+
+import argparse
+import json
+import sys
+
+try:
+    import stanza
+except ImportError:
+    print("Error: Stanza is not installed. Please install stanza.", file=sys.stderr)
+    sys.exit(1)
+
+
+# Map annotator selections to Stanza processor strings
+PROCESSOR_MAP = {
+    "tokenize": "tokenize",
+    "pos": "tokenize,mwt,pos,lemma",
+    "ner": "tokenize,mwt,ner",
+    "parse": "tokenize,mwt,pos,lemma,depparse",
+    "sentiment": "tokenize,mwt,sentiment",
+    "constituency": "tokenize,mwt,pos,constituency",
+}
+
+
+def process_text(doc, output_format, annotator):
+    """Process a Stanza Document and format output."""
+    if output_format == "json":
+        return format_json(doc, annotator)
+    elif output_format == "conll":
+        return format_conll(doc)
+    elif output_format == "conllu":
+        return format_conllu(doc)
+    elif output_format == "text":
+        return format_text(doc, annotator)
+    else:
+        return format_json(doc, annotator)
+
+
+def format_json(doc, annotator):
+    """Format document as JSON."""
+    output = {"text": doc.text, "sentences": []}
+
+    for sent in doc.sentences:
+        sent_data = {"text": sent.text, "tokens": []}
+
+        for word in sent.words:
+            token_data = {
+                "text": word.text,
+                "start_char": word.start_char,
+                "end_char": word.end_char,
+            }
+
+            if annotator in ("pos", "parse", "constituency"):
+                token_data["upos"] = word.upos
+                token_data["xpos"] = word.xpos
+                token_data["lemma"] = word.lemma
+                if word.feats:
+                    token_data["feats"] = word.feats
+
+            if annotator == "parse":
+                token_data["deprel"] = word.deprel
+                token_data["head"] = word.head
+
+            sent_data["tokens"].append(token_data)
+
+        if annotator == "ner" and sent.ents:
+            sent_data["entities"] = [
+                {
+                    "text": ent.text,
+                    "type": ent.type,
+                    "start_char": ent.start_char,
+                    "end_char": ent.end_char,
+                }
+                for ent in sent.ents
+            ]
+
+        if annotator == "sentiment" and sent.sentiment is not None:
+            sent_data["sentiment"] = sent.sentiment
+
+        if annotator == "constituency" and sent.constituency is not None:
+            sent_data["constituency"] = str(sent.constituency)
+
+        output["sentences"].append(sent_data)
+
+    return json.dumps(output, indent=2, ensure_ascii=False)
+
+
+def format_conll(doc):
+    """Format document as CoNLL (tab-separated)."""
+    lines = []
+    for sent in doc.sentences:
+        for word in sent.words:
+            ner_tag = "O"
+            if hasattr(word, 'parent') and word.parent and hasattr(word.parent, 'ner'):
+                ner_tag = word.parent.ner if word.parent.ner else "O"
+            head = word.head if word.head is not None else 0
+            deprel = word.deprel if word.deprel else "_"
+            lemma = word.lemma if word.lemma else "_"
+            xpos = word.xpos if word.xpos else "_"
+
+            line = f"{word.id}\t{word.text}\t{lemma}\t{xpos}\t{ner_tag}\t{head}\t{deprel}"
+            lines.append(line)
+        lines.append("")
+    return "\n".join(lines)
+
+
+def format_conllu(doc):
+    """Format document as CoNLL-U (Universal Dependencies format)."""
+    lines = []
+    for sent in doc.sentences:
+        for word in sent.words:
+            upos = word.upos if word.upos else "_"
+            xpos = word.xpos if word.xpos else "_"
+            lemma = word.lemma if word.lemma else "_"
+            feats = word.feats if word.feats else "_"
+            head = word.head if word.head is not None else 0
+            deprel = word.deprel if word.deprel else "_"
+
+            line = f"{word.id}\t{word.text}\t{lemma}\t{upos}\t{xpos}\t{feats}\t{head}\t{deprel}\t_\t_"
+            lines.append(line)
+        lines.append("")
+    return "\n".join(lines)
+
+
+def format_text(doc, annotator):
+    """Format document as human-readable text."""
+    lines = []
+
+    num_tokens = sum(len(sent.words) for sent in doc.sentences)
+    num_sents = len(doc.sentences)
+    lines.append(f"Document Statistics: {num_sents} sentences, {num_tokens} tokens\n")
+
+    for i, sent in enumerate(doc.sentences, 1):
+        lines.append(f"\nSentence #{i} ({len(sent.words)} tokens):")
+        lines.append(sent.text)
+        lines.append("")
+
+        if annotator in ("pos", "parse", "constituency"):
+            for word in sent.words:
+                parts = [f"  {word.text}"]
+                parts.append(f"lemma={word.lemma}")
+                parts.append(f"upos={word.upos}")
+                if word.xpos:
+                    parts.append(f"xpos={word.xpos}")
+                if annotator == "parse" and word.deprel:
+                    parts.append(f"deprel={word.deprel}")
+                    parts.append(f"head={word.head}")
+                lines.append(" | ".join(parts))
+            lines.append("")
+
+        if annotator == "ner" and sent.ents:
+            lines.append("  Named Entities:")
+            for ent in sent.ents:
+                lines.append(f"    {ent.text} ({ent.type})")
+            lines.append("")
+
+        if annotator == "sentiment" and sent.sentiment is not None:
+            labels = {0: "negative", 1: "neutral", 2: "positive"}
+            lines.append(f"  Sentiment: {labels.get(sent.sentiment, sent.sentiment)}")
+            lines.append("")
+
+        if annotator == "constituency" and sent.constituency is not None:
+            lines.append(f"  Constituency: {sent.constituency}")
+            lines.append("")
+
+    return "\n".join(lines)
+
+
+def main():
+    parser = argparse.ArgumentParser(description="Process text with Stanza NLP")
+    parser.add_argument("--input", required=True, help="Input text file")
+    parser.add_argument("--output", required=True, help="Output file")
+    parser.add_argument("--lang", required=True, help="Language code")
+    parser.add_argument("--model-dir", required=True, help="Path to stanza_resources directory")
+    parser.add_argument("--format", choices=["json", "conll", "conllu", "text"],
+                        default="json", help="Output format")
+    parser.add_argument("--annotators", required=True, help="Annotation type")
+
+    args = parser.parse_args()
+
+    processors = PROCESSOR_MAP.get(args.annotators, "tokenize")
+
+    # Load Stanza pipeline using default_fast package (nocharlm) for lower memory usage
+    try:
+        nlp = stanza.Pipeline(
+            lang=args.lang,
+            dir=args.model_dir,
+            processors=processors,
+            package="default_fast",
+            download_method=None,
+            use_gpu=False,
+        )
+    except Exception as e:
+        print(f"Error loading Stanza pipeline: {e}", file=sys.stderr)
+        sys.exit(1)
+
+    # Read input text
+    try:
+        with open(args.input, 'r', encoding='utf-8') as f:
+            text = f.read()
+    except Exception as e:
+        print(f"Error reading input file: {e}", file=sys.stderr)
+        sys.exit(1)
+
+    # Process text
+    try:
+        doc = nlp(text)
+    except Exception as e:
+        print(f"Error processing text: {e}", file=sys.stderr)
+        sys.exit(1)
+
+    # Format and write output
+    try:
+        output = process_text(doc, args.format, args.annotators)
+        with open(args.output, 'w', encoding='utf-8') as f:
+            f.write(output)
+    except Exception as e:
+        print(f"Error writing output: {e}", file=sys.stderr)
+        sys.exit(1)
+
+    print(f"Successfully processed {len(text)} characters")
+
+
+if __name__ == "__main__":
+    main()
diff --git a/tools/stanza/test-data/input.txt b/tools/stanza/test-data/input.txt
new file mode 100644
index 00000000000..7cea21fac4e
--- /dev/null
+++ b/tools/stanza/test-data/input.txt
@@ -0,0 +1,2 @@
+John Smith went to Walmart on January 1, 1970 to buy IBM stock, then he went to the theater.
+
diff --git a/tools/stanza/test-data/stanza_models.loc b/tools/stanza/test-data/stanza_models.loc
new file mode 100644
index 00000000000..215f6241d01
--- /dev/null
+++ b/tools/stanza/test-data/stanza_models.loc
@@ -0,0 +1 @@
+en	English	en	/Users/suderman/Library/Caches/stanza/1.11.0/resources
diff --git a/tools/stanza/tool-data/stanza_models.loc.sample b/tools/stanza/tool-data/stanza_models.loc.sample
new file mode 100644
index 00000000000..2a70fbefe88
--- /dev/null
+++ b/tools/stanza/tool-data/stanza_models.loc.sample
@@ -0,0 +1,10 @@
+# Stanza language models
+# This file is maintained by the stanza_models data manager.
+#
+# Columns:
+# <value>	<name>	<lang>	<models_path>
+#
+# value: unique identifier for this model entry (language code)
+# name: display name shown in the tool UI
+# lang: ISO 639-1 language code
+# models_path: path to the stanza_resources directory containing the model
diff --git a/tools/stanza/tool_data_table_conf.xml.sample b/tools/stanza/tool_data_table_conf.xml.sample
new file mode 100644
index 00000000000..c9c90863118
--- /dev/null
+++ b/tools/stanza/tool_data_table_conf.xml.sample
@@ -0,0 +1,6 @@
+<tables>
+    <table name="stanza_models" comment_char="#">
+        <columns>value, name, lang, models_path</columns>
+        <file path="tool-data/stanza_models.loc" />
+    </table>
+</tables>

From 8674a16b13a9a0df9b9b0c1e20409b8232f166b7 Mon Sep 17 00:00:00 2001
From: Keith Suderman <suderman@jhu.edu>
Date: Tue, 19 May 2026 20:57:56 -0400
Subject: [PATCH 2/6] Add Stanza NLP tool and data manager

## Stanza NLP Tool
- Stanford Stanza NLP annotation tool supporting 80+ languages
- Provides tokenization, POS tagging, lemmatization, dependency parsing, NER
- Supports sentiment analysis and constituency parsing for select languages
- Multiple output formats: JSON, CoNLL-U, tabular, text

## Data Manager
- Downloads and installs Stanza language models from HuggingFace
- Uses nocharlm models optimized for memory efficiency
- Supports multi-select installation of language packages
- Integrates with Galaxy data tables for model selection

Co-Authored-By: Claude Sonnet 4 <noreply@anthropic.com>
---
 .../data_manager_stanza                       |   1 +
 tools/stanza/galaxy_tools_stanza/.shed.yml    |  14 ++
 tools/stanza/galaxy_tools_stanza/README.md    | 145 +++++++++++
 tools/stanza/galaxy_tools_stanza/macros.xml   |   4 +
 .../stanza/galaxy_tools_stanza/stanza_nlp.xml | 192 +++++++++++++++
 .../galaxy_tools_stanza/stanza_process.py     | 230 ++++++++++++++++++
 .../galaxy_tools_stanza/test-data/input.txt   |   2 +
 .../test-data/stanza_models.loc               |   1 +
 .../tool-data/stanza_models.loc.sample        |  10 +
 .../tool_data_table_conf.xml.sample           |   6 +
 .../tool_data_table_conf.xml.test             |   6 +
 11 files changed, 611 insertions(+)
 create mode 160000 data_managers/data_manager_stanza_models/data_manager_stanza
 create mode 100644 tools/stanza/galaxy_tools_stanza/.shed.yml
 create mode 100644 tools/stanza/galaxy_tools_stanza/README.md
 create mode 100644 tools/stanza/galaxy_tools_stanza/macros.xml
 create mode 100644 tools/stanza/galaxy_tools_stanza/stanza_nlp.xml
 create mode 100644 tools/stanza/galaxy_tools_stanza/stanza_process.py
 create mode 100644 tools/stanza/galaxy_tools_stanza/test-data/input.txt
 create mode 100644 tools/stanza/galaxy_tools_stanza/test-data/stanza_models.loc
 create mode 100644 tools/stanza/galaxy_tools_stanza/tool-data/stanza_models.loc.sample
 create mode 100644 tools/stanza/galaxy_tools_stanza/tool_data_table_conf.xml.sample
 create mode 100644 tools/stanza/galaxy_tools_stanza/tool_data_table_conf.xml.test

diff --git a/data_managers/data_manager_stanza_models/data_manager_stanza b/data_managers/data_manager_stanza_models/data_manager_stanza
new file mode 160000
index 00000000000..de06488b2a4
--- /dev/null
+++ b/data_managers/data_manager_stanza_models/data_manager_stanza
@@ -0,0 +1 @@
+Subproject commit de06488b2a4c2fe5caefb14e8aa1408159de6163
diff --git a/tools/stanza/galaxy_tools_stanza/.shed.yml b/tools/stanza/galaxy_tools_stanza/.shed.yml
new file mode 100644
index 00000000000..797bdfc7507
--- /dev/null
+++ b/tools/stanza/galaxy_tools_stanza/.shed.yml
@@ -0,0 +1,14 @@
+name: stanza_nlp
+owner: iuc
+description: Stanza NLP Annotators
+long_description: |
+  This tool provides Stanford Stanza natural language processing annotation capabilities
+  for Galaxy. It supports 80+ languages with various annotation types including tokenization,
+  POS tagging, lemmatization, dependency parsing, named entity recognition, sentiment analysis,
+  and constituency parsing.
+homepage_url: https://stanfordnlp.github.io/stanza/
+remote_repository_url: https://github.com/ksuderman/galaxy_tools_stanza
+type: unrestricted
+categories:
+  - Text Manipulation
+  - Natural Language Processing
diff --git a/tools/stanza/galaxy_tools_stanza/README.md b/tools/stanza/galaxy_tools_stanza/README.md
new file mode 100644
index 00000000000..104be8e78df
--- /dev/null
+++ b/tools/stanza/galaxy_tools_stanza/README.md
@@ -0,0 +1,145 @@
+# Galaxy Wrapper for Stanford Stanza NLP
+
+This Galaxy tool provides access to Stanza, Stanford's neural natural language processing toolkit, supporting 80+ human languages with state-of-the-art accuracy for multilingual text analysis.
+
+## Features
+
+- **80+ languages**: Comprehensive multilingual support for diverse text corpora
+- **Neural models**: State-of-the-art accuracy with pretrained neural networks
+- **Multiple annotators**: Tokenization, POS tagging, NER, parsing, sentiment, and constituency parsing
+- **Universal Dependencies**: Standardized annotations following Universal Dependencies v2.12
+- **Multiple output formats**: JSON, CoNLL, CoNLL-U, and human-readable text
+- **Dockerized execution**: CPU-optimized PyTorch models in container environment
+- **Data manager integration**: Language models downloaded and managed separately
+
+## Requirements
+
+- **Data Manager**: Language models must be installed via the Stanza Language Models data manager
+- **Docker**: Uses the `ksuderman/stanza-nlp:1.11.1` Docker image with CPU-optimized PyTorch
+- **Memory**: Uses default_fast models for efficient memory usage in containers
+
+## Annotation Types
+
+| Annotator | Description | Output |
+|---|---|---|
+| **Tokenization** | Sentence segmentation and tokenization | Tokens with character offsets |
+| **Part of speech** | POS tags, lemmas, and morphological features | Universal POS (UPOS), treebank POS (XPOS), lemmas |
+| **Named entity recognition** | Person, organization, location, date entities | Entity spans with types (PERSON, ORG, GPE, etc.) |
+| **Dependency parsing** | Syntactic dependencies following Universal Dependencies | Head-child relationships with dependency labels |
+| **Sentiment analysis** | Per-sentence sentiment scoring | Sentiment scores (0=negative, 1=neutral, 2=positive) |
+| **Constituency parsing** | Phrase structure parse trees | Hierarchical syntactic structure |
+
+## Language Coverage
+
+Stanza supports **80+ languages** including:
+
+### Major Languages
+- **European**: English, Spanish, German, French, Italian, Portuguese, Dutch, Swedish, Danish, Norwegian, Greek, Polish, Russian, Ukrainian
+- **Asian**: Chinese, Japanese, Korean, Arabic, Hindi, Turkish  
+- **Others**: And many more languages with Universal Dependencies treebanks
+
+### NER Support
+Named entity recognition is available for a subset of languages including:
+- English, Chinese, Spanish, German, French, Dutch, Russian, Ukrainian
+
+See [Stanza's model documentation](https://stanfordnlp.github.io/stanza/available_models.html) for the complete supported language list.
+
+## Input Format
+
+- **Text files**: Plain text input in any supported language
+- **Encoding**: UTF-8 text encoding
+
+## Output Formats
+
+### JSON (Recommended)
+Comprehensive structured output with all annotations:
+```json
+{
+  "sentences": [
+    {
+      "tokens": [
+        {
+          "id": 1,
+          "text": "John",
+          "lemma": "John", 
+          "upos": "PROPN",
+          "head": 2,
+          "deprel": "nsubj"
+        }
+      ],
+      "entities": [
+        {
+          "text": "John Smith",
+          "type": "PERSON",
+          "start_char": 0,
+          "end_char": 10
+        }
+      ]
+    }
+  ]
+}
+```
+
+### CoNLL-U
+Universal Dependencies format with morphological features:
+```
+1	John	John	PROPN	_	_	2	nsubj	_	_
+2	works	work	VERB	_	_	0	root	_	_
+```
+
+### CoNLL
+Tab-separated format suitable for dependency parsing analysis.
+
+### Text
+Human-readable output with statistics and formatted annotations.
+
+## Model Architecture
+
+- **Neural networks**: Pretrained neural models for each language and task
+- **Universal Dependencies**: Consistent annotation standards across languages
+- **Default-fast models**: Memory-efficient nocharlm models optimized for containers
+- **CPU-optimized**: PyTorch models configured for CPU-only execution
+
+## Example Use Cases
+
+- **Multilingual corpus analysis**: Process text in 80+ languages with consistent annotations
+- **Cross-lingual studies**: Compare linguistic phenomena across different languages
+- **Historical linguistics**: Analyze texts in various languages and time periods
+- **Digital humanities**: Multi-language support for international document collections
+- **Dependency syntax**: Universal Dependencies parsing for computational linguistics
+
+## Installation
+
+1. Install the data manager: `data_manager_stanza_models`
+2. Install this tool: `stanza_nlp`
+3. Use the data manager to download language models:
+   - Go to **Admin → Local Data**
+   - Select "Stanza Language Models"
+   - Choose language(s) to install
+   - Models download directly from HuggingFace
+
+## Performance Notes
+
+- **Memory efficient**: Uses default_fast models without character-level modeling
+- **CPU-optimized**: PyTorch configured for CPU-only execution
+- **Container isolation**: Runs in Docker for consistent environment
+- **Model caching**: Downloaded models persist across runs
+
+## Citation
+
+If you use this tool, please cite:
+
+```
+Qi, Peng, Yuhao Zhang, Yuhui Zhang, Jason Bolton, and Christopher D. Manning. 
+"Stanza: A Python Natural Language Processing Toolkit for Many Human Languages." 
+In Proceedings of the 58th Annual Meeting of the Association for Computational 
+Linguistics: System Demonstrations, 2020.
+```
+
+## Version History
+
+- **1.11.1+galaxy4**: Latest release with enhanced output formatting and CPU optimization
+- **1.11.1+galaxy3**: Previous stable release
+- **1.11.1+galaxy2**: Early release
+- **1.11.1+galaxy1**: Beta release
+- **1.11.1+galaxy0**: Initial release
\ No newline at end of file
diff --git a/tools/stanza/galaxy_tools_stanza/macros.xml b/tools/stanza/galaxy_tools_stanza/macros.xml
new file mode 100644
index 00000000000..f58769bae21
--- /dev/null
+++ b/tools/stanza/galaxy_tools_stanza/macros.xml
@@ -0,0 +1,4 @@
+<macros>
+    <token name="@TOOL_VERSION@">1.11.1</token>
+    <token name="@VERSION_SUFFIX@">4</token>
+</macros>
diff --git a/tools/stanza/galaxy_tools_stanza/stanza_nlp.xml b/tools/stanza/galaxy_tools_stanza/stanza_nlp.xml
new file mode 100644
index 00000000000..6b9f84c18ad
--- /dev/null
+++ b/tools/stanza/galaxy_tools_stanza/stanza_nlp.xml
@@ -0,0 +1,192 @@
+<tool id="stanza_nlp" name="Stanza NLP Annotators" version="@TOOL_VERSION@+galaxy@VERSION_SUFFIX@" python_template_version="3.5" profile="21.05">
+    <macros>
+        <import>macros.xml</import>
+    </macros>
+    <requirements>
+        <container type="docker">ksuderman/stanza-nlp:@TOOL_VERSION@</container>
+    </requirements>
+    <version_command><![CDATA[
+python -c "import stanza; print(stanza.__version__)"
+    ]]></version_command>
+    <command detect_errors="exit_code"><![CDATA[
+    export HOME=\${TMPDIR:-/tmp} &&
+    python '$__tool_directory__/stanza_process.py'
+    --input '$input'
+    --output '$outputFile'
+    --lang '${language_model.fields.lang}'
+    --model-dir '${language_model.fields.models_path}'
+    --format '$format'
+    --annotators '$annotators'
+    ]]></command>
+    <inputs>
+        <param name="input" type="data" format="txt" label="Text"/>
+        <param name="language_model" type="select" label="Language Model">
+            <options from_data_table="stanza_models">
+                <column name="value" index="0"/>
+                <column name="name" index="1"/>
+                <column name="lang" index="2"/>
+                <column name="models_path" index="3"/>
+                <filter type="sort_by" column="1"/>
+            </options>
+        </param>
+        <param name="annotators" type="select" label="Annotation types">
+            <option value="tokenize" selected="true">Tokenization and sentence segmentation</option>
+            <option value="pos">Part of speech, lemmas, and morphological features</option>
+            <option value="ner">Named entity recognition</option>
+            <option value="parse">Dependency parsing</option>
+            <option value="sentiment">Sentiment analysis</option>
+            <option value="constituency">Constituency parsing</option>
+        </param>
+        <param name="format" type="select" label="Output format">
+            <option value="json" selected="true">JSON</option>
+            <option value="conll">CoNLL</option>
+            <option value="conllu">CoNLL-U</option>
+            <option value="text">Text</option>
+        </param>
+    </inputs>
+    <outputs>
+        <data name="outputFile" format="txt" label="${tool.name} (${annotators}) on ${on_string}">
+            <change_format>
+                <when input="format" value="json" format="json"/>
+                <when input="format" value="conllu" format="tabular"/>
+                <when input="format" value="conll" format="tabular"/>
+            </change_format>
+        </data>
+    </outputs>
+    <tests>
+        <test>
+            <param name="input" value="input.txt"/>
+            <param name="language_model" value="en"/>
+            <param name="annotators" value="tokenize"/>
+            <param name="format" value="json"/>
+            <output name="outputFile">
+                <assert_contents>
+                    <has_text text="&quot;text&quot;"/>
+                    <has_text text="&quot;tokens&quot;"/>
+                    <has_text text="John"/>
+                </assert_contents>
+            </output>
+        </test>
+        <test>
+            <param name="input" value="input.txt"/>
+            <param name="language_model" value="en"/>
+            <param name="annotators" value="ner"/>
+            <param name="format" value="json"/>
+            <output name="outputFile">
+                <assert_contents>
+                    <has_text text="&quot;entities&quot;"/>
+                    <has_text text="&quot;type&quot;"/>
+                </assert_contents>
+            </output>
+        </test>
+        <test>
+            <param name="input" value="input.txt"/>
+            <param name="language_model" value="en"/>
+            <param name="annotators" value="parse"/>
+            <param name="format" value="conllu"/>
+            <output name="outputFile">
+                <assert_contents>
+                    <has_text text="John"/>
+                    <has_text text="Smith"/>
+                </assert_contents>
+            </output>
+        </test>
+        <test>
+            <param name="input" value="input.txt"/>
+            <param name="language_model" value="en"/>
+            <param name="annotators" value="sentiment"/>
+            <param name="format" value="json"/>
+            <output name="outputFile">
+                <assert_contents>
+                    <has_text text="&quot;sentiment&quot;"/>
+                </assert_contents>
+            </output>
+        </test>
+    </tests>
+    <help><![CDATA[
+
+Stanza NLP
+==========
+
+Galaxy wrapper for the `Stanza <https://stanfordnlp.github.io/stanza/>`_ natural language
+processing toolkit from Stanford NLP Group. Stanza provides pretrained neural models
+supporting 80+ human languages.
+
+Annotation Types
+----------------
+
+Tokenization and sentence segmentation
+    Splits text into tokens and identifies sentence boundaries. Handles multi-word
+    token expansion for applicable languages.
+
+Part of speech, lemmas, and morphological features
+    Includes tokenization plus universal POS tagging (UPOS), treebank-specific POS
+    tagging (XPOS), lemmatization, and morphological feature analysis.
+
+Named entity recognition (NER)
+    Identifies named entities such as PERSON, ORG, GPE, DATE, etc. Available for
+    a subset of supported languages (8+ languages including English, Chinese, Spanish,
+    German, French, Dutch, Russian, and Ukrainian).
+
+Dependency parsing
+    Syntactic dependency parsing following Universal Dependencies annotation. Identifies
+    grammatical relationships (head and dependency relation) for each token.
+
+Sentiment analysis
+    Per-sentence sentiment scoring (0=negative, 1=neutral, 2=positive). Available for
+    languages with sentiment models.
+
+Constituency parsing
+    Phrase structure parse trees. Available for languages with constituency models.
+
+Output Formats
+--------------
+
+**JSON**
+    Comprehensive structured output with all annotations. Best for programmatic access.
+
+**CoNLL**
+    Tab-separated format suitable for dependency parsing tasks.
+
+**CoNLL-U**
+    Universal Dependencies format with morphological features.
+
+**Text**
+    Human-readable text output with statistics and annotations.
+
+Language Models
+---------------
+
+Stanza uses pretrained neural models organized by language. Models are downloaded and
+managed by the Stanza data manager. Each language may include models for different
+tasks (tokenization, POS, NER, etc.) trained on Universal Dependencies v2.12.
+
+Install models using the data manager (Admin > Local Data > Stanza Language Models).
+
+Supported Languages
+-------------------
+
+Stanza supports 80+ languages including:
+
+- English, Spanish, German, French, Italian, Portuguese
+- Chinese, Japanese, Korean
+- Arabic, Hindi, Turkish
+- Russian, Ukrainian, Polish
+- Dutch, Greek, Swedish, Danish, Norwegian
+- And many more...
+
+See https://stanfordnlp.github.io/stanza/available_models.html for the complete list.
+
+    ]]></help>
+    <citations>
+        <citation type="bibtex">
+@inproceedings{qi2020stanza,
+  title={Stanza: A {P}ython Natural Language Processing Toolkit for Many Human Languages},
+  author={Qi, Peng and Zhang, Yuhao and Zhang, Yuhui and Bolton, Jason and Manning, Christopher D.},
+  booktitle={Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics: System Demonstrations},
+  year={2020},
+  url={https://stanfordnlp.github.io/stanza/}
+}
+        </citation>
+    </citations>
+</tool>
diff --git a/tools/stanza/galaxy_tools_stanza/stanza_process.py b/tools/stanza/galaxy_tools_stanza/stanza_process.py
new file mode 100644
index 00000000000..738a79694b2
--- /dev/null
+++ b/tools/stanza/galaxy_tools_stanza/stanza_process.py
@@ -0,0 +1,230 @@
+#!/usr/bin/env python
+"""
+Stanza NLP Processing Script for Galaxy
+
+Processes text files with Stanza and outputs results in various formats.
+"""
+
+import argparse
+import json
+import sys
+
+try:
+    import stanza
+except ImportError:
+    print("Error: Stanza is not installed. Please install stanza.", file=sys.stderr)
+    sys.exit(1)
+
+
+# Map annotator selections to Stanza processor strings
+PROCESSOR_MAP = {
+    "tokenize": "tokenize",
+    "pos": "tokenize,mwt,pos,lemma",
+    "ner": "tokenize,mwt,ner",
+    "parse": "tokenize,mwt,pos,lemma,depparse",
+    "sentiment": "tokenize,mwt,sentiment",
+    "constituency": "tokenize,mwt,pos,constituency",
+}
+
+
+def process_text(doc, output_format, annotator):
+    """Process a Stanza Document and format output."""
+    if output_format == "json":
+        return format_json(doc, annotator)
+    elif output_format == "conll":
+        return format_conll(doc)
+    elif output_format == "conllu":
+        return format_conllu(doc)
+    elif output_format == "text":
+        return format_text(doc, annotator)
+    else:
+        return format_json(doc, annotator)
+
+
+def format_json(doc, annotator):
+    """Format document as JSON."""
+    output = {"text": doc.text, "sentences": []}
+
+    for sent in doc.sentences:
+        sent_data = {"text": sent.text, "tokens": []}
+
+        for word in sent.words:
+            token_data = {
+                "text": word.text,
+                "start_char": word.start_char,
+                "end_char": word.end_char,
+            }
+
+            if annotator in ("pos", "parse", "constituency"):
+                token_data["upos"] = word.upos
+                token_data["xpos"] = word.xpos
+                token_data["lemma"] = word.lemma
+                if word.feats:
+                    token_data["feats"] = word.feats
+
+            if annotator == "parse":
+                token_data["deprel"] = word.deprel
+                token_data["head"] = word.head
+
+            sent_data["tokens"].append(token_data)
+
+        if annotator == "ner" and sent.ents:
+            sent_data["entities"] = [
+                {
+                    "text": ent.text,
+                    "type": ent.type,
+                    "start_char": ent.start_char,
+                    "end_char": ent.end_char,
+                }
+                for ent in sent.ents
+            ]
+
+        if annotator == "sentiment" and sent.sentiment is not None:
+            sent_data["sentiment"] = sent.sentiment
+
+        if annotator == "constituency" and sent.constituency is not None:
+            sent_data["constituency"] = str(sent.constituency)
+
+        output["sentences"].append(sent_data)
+
+    return json.dumps(output, indent=2, ensure_ascii=False)
+
+
+def format_conll(doc):
+    """Format document as CoNLL (tab-separated)."""
+    lines = []
+    for sent in doc.sentences:
+        for word in sent.words:
+            ner_tag = "O"
+            if hasattr(word, 'parent') and word.parent and hasattr(word.parent, 'ner'):
+                ner_tag = word.parent.ner if word.parent.ner else "O"
+            head = word.head if word.head is not None else 0
+            deprel = word.deprel if word.deprel else "_"
+            lemma = word.lemma if word.lemma else "_"
+            xpos = word.xpos if word.xpos else "_"
+
+            line = f"{word.id}\t{word.text}\t{lemma}\t{xpos}\t{ner_tag}\t{head}\t{deprel}"
+            lines.append(line)
+        lines.append("")
+    return "\n".join(lines)
+
+
+def format_conllu(doc):
+    """Format document as CoNLL-U (Universal Dependencies format)."""
+    lines = []
+    for sent in doc.sentences:
+        for word in sent.words:
+            upos = word.upos if word.upos else "_"
+            xpos = word.xpos if word.xpos else "_"
+            lemma = word.lemma if word.lemma else "_"
+            feats = word.feats if word.feats else "_"
+            head = word.head if word.head is not None else 0
+            deprel = word.deprel if word.deprel else "_"
+
+            line = f"{word.id}\t{word.text}\t{lemma}\t{upos}\t{xpos}\t{feats}\t{head}\t{deprel}\t_\t_"
+            lines.append(line)
+        lines.append("")
+    return "\n".join(lines)
+
+
+def format_text(doc, annotator):
+    """Format document as human-readable text."""
+    lines = []
+
+    num_tokens = sum(len(sent.words) for sent in doc.sentences)
+    num_sents = len(doc.sentences)
+    lines.append(f"Document Statistics: {num_sents} sentences, {num_tokens} tokens\n")
+
+    for i, sent in enumerate(doc.sentences, 1):
+        lines.append(f"\nSentence #{i} ({len(sent.words)} tokens):")
+        lines.append(sent.text)
+        lines.append("")
+
+        if annotator in ("pos", "parse", "constituency"):
+            for word in sent.words:
+                parts = [f"  {word.text}"]
+                parts.append(f"lemma={word.lemma}")
+                parts.append(f"upos={word.upos}")
+                if word.xpos:
+                    parts.append(f"xpos={word.xpos}")
+                if annotator == "parse" and word.deprel:
+                    parts.append(f"deprel={word.deprel}")
+                    parts.append(f"head={word.head}")
+                lines.append(" | ".join(parts))
+            lines.append("")
+
+        if annotator == "ner" and sent.ents:
+            lines.append("  Named Entities:")
+            for ent in sent.ents:
+                lines.append(f"    {ent.text} ({ent.type})")
+            lines.append("")
+
+        if annotator == "sentiment" and sent.sentiment is not None:
+            labels = {0: "negative", 1: "neutral", 2: "positive"}
+            lines.append(f"  Sentiment: {labels.get(sent.sentiment, sent.sentiment)}")
+            lines.append("")
+
+        if annotator == "constituency" and sent.constituency is not None:
+            lines.append(f"  Constituency: {sent.constituency}")
+            lines.append("")
+
+    return "\n".join(lines)
+
+
+def main():
+    parser = argparse.ArgumentParser(description="Process text with Stanza NLP")
+    parser.add_argument("--input", required=True, help="Input text file")
+    parser.add_argument("--output", required=True, help="Output file")
+    parser.add_argument("--lang", required=True, help="Language code")
+    parser.add_argument("--model-dir", required=True, help="Path to stanza_resources directory")
+    parser.add_argument("--format", choices=["json", "conll", "conllu", "text"],
+                        default="json", help="Output format")
+    parser.add_argument("--annotators", required=True, help="Annotation type")
+
+    args = parser.parse_args()
+
+    processors = PROCESSOR_MAP.get(args.annotators, "tokenize")
+
+    # Load Stanza pipeline using default_fast package (nocharlm) for lower memory usage
+    try:
+        nlp = stanza.Pipeline(
+            lang=args.lang,
+            dir=args.model_dir,
+            processors=processors,
+            package="default_fast",
+            download_method=None,
+            use_gpu=False,
+        )
+    except Exception as e:
+        print(f"Error loading Stanza pipeline: {e}", file=sys.stderr)
+        sys.exit(1)
+
+    # Read input text
+    try:
+        with open(args.input, 'r', encoding='utf-8') as f:
+            text = f.read()
+    except Exception as e:
+        print(f"Error reading input file: {e}", file=sys.stderr)
+        sys.exit(1)
+
+    # Process text
+    try:
+        doc = nlp(text)
+    except Exception as e:
+        print(f"Error processing text: {e}", file=sys.stderr)
+        sys.exit(1)
+
+    # Format and write output
+    try:
+        output = process_text(doc, args.format, args.annotators)
+        with open(args.output, 'w', encoding='utf-8') as f:
+            f.write(output)
+    except Exception as e:
+        print(f"Error writing output: {e}", file=sys.stderr)
+        sys.exit(1)
+
+    print(f"Successfully processed {len(text)} characters")
+
+
+if __name__ == "__main__":
+    main()
diff --git a/tools/stanza/galaxy_tools_stanza/test-data/input.txt b/tools/stanza/galaxy_tools_stanza/test-data/input.txt
new file mode 100644
index 00000000000..7cea21fac4e
--- /dev/null
+++ b/tools/stanza/galaxy_tools_stanza/test-data/input.txt
@@ -0,0 +1,2 @@
+John Smith went to Walmart on January 1, 1970 to buy IBM stock, then he went to the theater.
+
diff --git a/tools/stanza/galaxy_tools_stanza/test-data/stanza_models.loc b/tools/stanza/galaxy_tools_stanza/test-data/stanza_models.loc
new file mode 100644
index 00000000000..215f6241d01
--- /dev/null
+++ b/tools/stanza/galaxy_tools_stanza/test-data/stanza_models.loc
@@ -0,0 +1 @@
+en	English	en	/Users/suderman/Library/Caches/stanza/1.11.0/resources
diff --git a/tools/stanza/galaxy_tools_stanza/tool-data/stanza_models.loc.sample b/tools/stanza/galaxy_tools_stanza/tool-data/stanza_models.loc.sample
new file mode 100644
index 00000000000..2a70fbefe88
--- /dev/null
+++ b/tools/stanza/galaxy_tools_stanza/tool-data/stanza_models.loc.sample
@@ -0,0 +1,10 @@
+# Stanza language models
+# This file is maintained by the stanza_models data manager.
+#
+# Columns:
+# <value>	<name>	<lang>	<models_path>
+#
+# value: unique identifier for this model entry (language code)
+# name: display name shown in the tool UI
+# lang: ISO 639-1 language code
+# models_path: path to the stanza_resources directory containing the model
diff --git a/tools/stanza/galaxy_tools_stanza/tool_data_table_conf.xml.sample b/tools/stanza/galaxy_tools_stanza/tool_data_table_conf.xml.sample
new file mode 100644
index 00000000000..c9c90863118
--- /dev/null
+++ b/tools/stanza/galaxy_tools_stanza/tool_data_table_conf.xml.sample
@@ -0,0 +1,6 @@
+<tables>
+    <table name="stanza_models" comment_char="#">
+        <columns>value, name, lang, models_path</columns>
+        <file path="tool-data/stanza_models.loc" />
+    </table>
+</tables>
diff --git a/tools/stanza/galaxy_tools_stanza/tool_data_table_conf.xml.test b/tools/stanza/galaxy_tools_stanza/tool_data_table_conf.xml.test
new file mode 100644
index 00000000000..72e4b02a577
--- /dev/null
+++ b/tools/stanza/galaxy_tools_stanza/tool_data_table_conf.xml.test
@@ -0,0 +1,6 @@
+<tables>
+    <table name="stanza_models" comment_char="#">
+        <columns>value, name, lang, models_path</columns>
+        <file path="${__HERE__}/test-data/stanza_models.loc" />
+    </table>
+</tables>

From 8103208398d9d55a94acaedcc46092eb891315cb Mon Sep 17 00:00:00 2001
From: Keith Suderman <suderman@jhu.edu>
Date: Tue, 19 May 2026 20:59:05 -0400
Subject: [PATCH 3/6] Add Stanza NLP tool and data manager

## Stanza NLP Tool
- Stanford Stanza NLP annotation tool supporting 80+ languages
- Provides tokenization, POS tagging, lemmatization, dependency parsing, NER
- Supports sentiment analysis and constituency parsing for select languages
- Multiple output formats: JSON, CoNLL-U, tabular, text

## Data Manager
- Downloads and installs Stanza language models from HuggingFace
- Uses nocharlm models optimized for memory efficiency
- Supports multi-select installation of language packages
- Integrates with Galaxy data tables for model selection

Co-Authored-By: Claude Sonnet 4 <noreply@anthropic.com>
---
 data_managers/data_manager_stanza_models/.shed.yml     |  2 +-
 .../tool-data/stanza_models.loc.sample                 | 10 ++++++++++
 .../tool_data_table_conf.xml.sample                    |  6 ++++++
 tools/stanza/tool_data_table_conf.xml.test             |  6 ++++++
 4 files changed, 23 insertions(+), 1 deletion(-)
 create mode 100644 data_managers/data_manager_stanza_models/tool-data/stanza_models.loc.sample
 create mode 100644 data_managers/data_manager_stanza_models/tool_data_table_conf.xml.sample
 create mode 100644 tools/stanza/tool_data_table_conf.xml.test

diff --git a/data_managers/data_manager_stanza_models/.shed.yml b/data_managers/data_manager_stanza_models/.shed.yml
index b465da5eb11..6fbd72a1a47 100644
--- a/data_managers/data_manager_stanza_models/.shed.yml
+++ b/data_managers/data_manager_stanza_models/.shed.yml
@@ -7,7 +7,7 @@ long_description: |
   languages with models for tokenization, POS tagging, lemmatization, dependency
   parsing, NER, sentiment analysis, and constituency parsing.
 homepage_url: https://stanfordnlp.github.io/stanza/
-remote_repository_url: https://github.com/ksuderman/galaxy_tools_stanza
+remote_repository_url: https://github.com/ksuderman/data_manager_stanza
 type: unrestricted
 categories:
   - Data Managers
diff --git a/data_managers/data_manager_stanza_models/tool-data/stanza_models.loc.sample b/data_managers/data_manager_stanza_models/tool-data/stanza_models.loc.sample
new file mode 100644
index 00000000000..2a70fbefe88
--- /dev/null
+++ b/data_managers/data_manager_stanza_models/tool-data/stanza_models.loc.sample
@@ -0,0 +1,10 @@
+# Stanza language models
+# This file is maintained by the stanza_models data manager.
+#
+# Columns:
+# <value>	<name>	<lang>	<models_path>
+#
+# value: unique identifier for this model entry (language code)
+# name: display name shown in the tool UI
+# lang: ISO 639-1 language code
+# models_path: path to the stanza_resources directory containing the model
diff --git a/data_managers/data_manager_stanza_models/tool_data_table_conf.xml.sample b/data_managers/data_manager_stanza_models/tool_data_table_conf.xml.sample
new file mode 100644
index 00000000000..c9c90863118
--- /dev/null
+++ b/data_managers/data_manager_stanza_models/tool_data_table_conf.xml.sample
@@ -0,0 +1,6 @@
+<tables>
+    <table name="stanza_models" comment_char="#">
+        <columns>value, name, lang, models_path</columns>
+        <file path="tool-data/stanza_models.loc" />
+    </table>
+</tables>
diff --git a/tools/stanza/tool_data_table_conf.xml.test b/tools/stanza/tool_data_table_conf.xml.test
new file mode 100644
index 00000000000..72e4b02a577
--- /dev/null
+++ b/tools/stanza/tool_data_table_conf.xml.test
@@ -0,0 +1,6 @@
+<tables>
+    <table name="stanza_models" comment_char="#">
+        <columns>value, name, lang, models_path</columns>
+        <file path="${__HERE__}/test-data/stanza_models.loc" />
+    </table>
+</tables>

From 8758ca45da50f519508d690b35ae186e457aa748 Mon Sep 17 00:00:00 2001
From: Keith Suderman <suderman@jhu.edu>
Date: Wed, 20 May 2026 12:49:46 -0400
Subject: [PATCH 4/6] Remove duplicate directories and test outputs

- Remove nested galaxy_tools_stanza/ directory from tools/stanza/
- Remove data_manager_stanza/ subdirectory from data manager
- Clean up generated test output files
---
 .../data_manager_stanza                       |   1 -
 tools/stanza/galaxy_tools_stanza/.shed.yml    |  14 --
 tools/stanza/galaxy_tools_stanza/README.md    | 145 -----------
 tools/stanza/galaxy_tools_stanza/macros.xml   |   4 -
 .../stanza/galaxy_tools_stanza/stanza_nlp.xml | 192 ---------------
 .../galaxy_tools_stanza/stanza_process.py     | 230 ------------------
 .../galaxy_tools_stanza/test-data/input.txt   |   2 -
 .../test-data/stanza_models.loc               |   1 -
 .../tool-data/stanza_models.loc.sample        |  10 -
 .../tool_data_table_conf.xml.sample           |   6 -
 .../tool_data_table_conf.xml.test             |   6 -
 11 files changed, 611 deletions(-)
 delete mode 160000 data_managers/data_manager_stanza_models/data_manager_stanza
 delete mode 100644 tools/stanza/galaxy_tools_stanza/.shed.yml
 delete mode 100644 tools/stanza/galaxy_tools_stanza/README.md
 delete mode 100644 tools/stanza/galaxy_tools_stanza/macros.xml
 delete mode 100644 tools/stanza/galaxy_tools_stanza/stanza_nlp.xml
 delete mode 100644 tools/stanza/galaxy_tools_stanza/stanza_process.py
 delete mode 100644 tools/stanza/galaxy_tools_stanza/test-data/input.txt
 delete mode 100644 tools/stanza/galaxy_tools_stanza/test-data/stanza_models.loc
 delete mode 100644 tools/stanza/galaxy_tools_stanza/tool-data/stanza_models.loc.sample
 delete mode 100644 tools/stanza/galaxy_tools_stanza/tool_data_table_conf.xml.sample
 delete mode 100644 tools/stanza/galaxy_tools_stanza/tool_data_table_conf.xml.test

diff --git a/data_managers/data_manager_stanza_models/data_manager_stanza b/data_managers/data_manager_stanza_models/data_manager_stanza
deleted file mode 160000
index de06488b2a4..00000000000
--- a/data_managers/data_manager_stanza_models/data_manager_stanza
+++ /dev/null
@@ -1 +0,0 @@
-Subproject commit de06488b2a4c2fe5caefb14e8aa1408159de6163
diff --git a/tools/stanza/galaxy_tools_stanza/.shed.yml b/tools/stanza/galaxy_tools_stanza/.shed.yml
deleted file mode 100644
index 797bdfc7507..00000000000
--- a/tools/stanza/galaxy_tools_stanza/.shed.yml
+++ /dev/null
@@ -1,14 +0,0 @@
-name: stanza_nlp
-owner: iuc
-description: Stanza NLP Annotators
-long_description: |
-  This tool provides Stanford Stanza natural language processing annotation capabilities
-  for Galaxy. It supports 80+ languages with various annotation types including tokenization,
-  POS tagging, lemmatization, dependency parsing, named entity recognition, sentiment analysis,
-  and constituency parsing.
-homepage_url: https://stanfordnlp.github.io/stanza/
-remote_repository_url: https://github.com/ksuderman/galaxy_tools_stanza
-type: unrestricted
-categories:
-  - Text Manipulation
-  - Natural Language Processing
diff --git a/tools/stanza/galaxy_tools_stanza/README.md b/tools/stanza/galaxy_tools_stanza/README.md
deleted file mode 100644
index 104be8e78df..00000000000
--- a/tools/stanza/galaxy_tools_stanza/README.md
+++ /dev/null
@@ -1,145 +0,0 @@
-# Galaxy Wrapper for Stanford Stanza NLP
-
-This Galaxy tool provides access to Stanza, Stanford's neural natural language processing toolkit, supporting 80+ human languages with state-of-the-art accuracy for multilingual text analysis.
-
-## Features
-
-- **80+ languages**: Comprehensive multilingual support for diverse text corpora
-- **Neural models**: State-of-the-art accuracy with pretrained neural networks
-- **Multiple annotators**: Tokenization, POS tagging, NER, parsing, sentiment, and constituency parsing
-- **Universal Dependencies**: Standardized annotations following Universal Dependencies v2.12
-- **Multiple output formats**: JSON, CoNLL, CoNLL-U, and human-readable text
-- **Dockerized execution**: CPU-optimized PyTorch models in container environment
-- **Data manager integration**: Language models downloaded and managed separately
-
-## Requirements
-
-- **Data Manager**: Language models must be installed via the Stanza Language Models data manager
-- **Docker**: Uses the `ksuderman/stanza-nlp:1.11.1` Docker image with CPU-optimized PyTorch
-- **Memory**: Uses default_fast models for efficient memory usage in containers
-
-## Annotation Types
-
-| Annotator | Description | Output |
-|---|---|---|
-| **Tokenization** | Sentence segmentation and tokenization | Tokens with character offsets |
-| **Part of speech** | POS tags, lemmas, and morphological features | Universal POS (UPOS), treebank POS (XPOS), lemmas |
-| **Named entity recognition** | Person, organization, location, date entities | Entity spans with types (PERSON, ORG, GPE, etc.) |
-| **Dependency parsing** | Syntactic dependencies following Universal Dependencies | Head-child relationships with dependency labels |
-| **Sentiment analysis** | Per-sentence sentiment scoring | Sentiment scores (0=negative, 1=neutral, 2=positive) |
-| **Constituency parsing** | Phrase structure parse trees | Hierarchical syntactic structure |
-
-## Language Coverage
-
-Stanza supports **80+ languages** including:
-
-### Major Languages
-- **European**: English, Spanish, German, French, Italian, Portuguese, Dutch, Swedish, Danish, Norwegian, Greek, Polish, Russian, Ukrainian
-- **Asian**: Chinese, Japanese, Korean, Arabic, Hindi, Turkish  
-- **Others**: And many more languages with Universal Dependencies treebanks
-
-### NER Support
-Named entity recognition is available for a subset of languages including:
-- English, Chinese, Spanish, German, French, Dutch, Russian, Ukrainian
-
-See [Stanza's model documentation](https://stanfordnlp.github.io/stanza/available_models.html) for the complete supported language list.
-
-## Input Format
-
-- **Text files**: Plain text input in any supported language
-- **Encoding**: UTF-8 text encoding
-
-## Output Formats
-
-### JSON (Recommended)
-Comprehensive structured output with all annotations:
-```json
-{
-  "sentences": [
-    {
-      "tokens": [
-        {
-          "id": 1,
-          "text": "John",
-          "lemma": "John", 
-          "upos": "PROPN",
-          "head": 2,
-          "deprel": "nsubj"
-        }
-      ],
-      "entities": [
-        {
-          "text": "John Smith",
-          "type": "PERSON",
-          "start_char": 0,
-          "end_char": 10
-        }
-      ]
-    }
-  ]
-}
-```
-
-### CoNLL-U
-Universal Dependencies format with morphological features:
-```
-1	John	John	PROPN	_	_	2	nsubj	_	_
-2	works	work	VERB	_	_	0	root	_	_
-```
-
-### CoNLL
-Tab-separated format suitable for dependency parsing analysis.
-
-### Text
-Human-readable output with statistics and formatted annotations.
-
-## Model Architecture
-
-- **Neural networks**: Pretrained neural models for each language and task
-- **Universal Dependencies**: Consistent annotation standards across languages
-- **Default-fast models**: Memory-efficient nocharlm models optimized for containers
-- **CPU-optimized**: PyTorch models configured for CPU-only execution
-
-## Example Use Cases
-
-- **Multilingual corpus analysis**: Process text in 80+ languages with consistent annotations
-- **Cross-lingual studies**: Compare linguistic phenomena across different languages
-- **Historical linguistics**: Analyze texts in various languages and time periods
-- **Digital humanities**: Multi-language support for international document collections
-- **Dependency syntax**: Universal Dependencies parsing for computational linguistics
-
-## Installation
-
-1. Install the data manager: `data_manager_stanza_models`
-2. Install this tool: `stanza_nlp`
-3. Use the data manager to download language models:
-   - Go to **Admin → Local Data**
-   - Select "Stanza Language Models"
-   - Choose language(s) to install
-   - Models download directly from HuggingFace
-
-## Performance Notes
-
-- **Memory efficient**: Uses default_fast models without character-level modeling
-- **CPU-optimized**: PyTorch configured for CPU-only execution
-- **Container isolation**: Runs in Docker for consistent environment
-- **Model caching**: Downloaded models persist across runs
-
-## Citation
-
-If you use this tool, please cite:
-
-```
-Qi, Peng, Yuhao Zhang, Yuhui Zhang, Jason Bolton, and Christopher D. Manning. 
-"Stanza: A Python Natural Language Processing Toolkit for Many Human Languages." 
-In Proceedings of the 58th Annual Meeting of the Association for Computational 
-Linguistics: System Demonstrations, 2020.
-```
-
-## Version History
-
-- **1.11.1+galaxy4**: Latest release with enhanced output formatting and CPU optimization
-- **1.11.1+galaxy3**: Previous stable release
-- **1.11.1+galaxy2**: Early release
-- **1.11.1+galaxy1**: Beta release
-- **1.11.1+galaxy0**: Initial release
\ No newline at end of file
diff --git a/tools/stanza/galaxy_tools_stanza/macros.xml b/tools/stanza/galaxy_tools_stanza/macros.xml
deleted file mode 100644
index f58769bae21..00000000000
--- a/tools/stanza/galaxy_tools_stanza/macros.xml
+++ /dev/null
@@ -1,4 +0,0 @@
-<macros>
-    <token name="@TOOL_VERSION@">1.11.1</token>
-    <token name="@VERSION_SUFFIX@">4</token>
-</macros>
diff --git a/tools/stanza/galaxy_tools_stanza/stanza_nlp.xml b/tools/stanza/galaxy_tools_stanza/stanza_nlp.xml
deleted file mode 100644
index 6b9f84c18ad..00000000000
--- a/tools/stanza/galaxy_tools_stanza/stanza_nlp.xml
+++ /dev/null
@@ -1,192 +0,0 @@
-<tool id="stanza_nlp" name="Stanza NLP Annotators" version="@TOOL_VERSION@+galaxy@VERSION_SUFFIX@" python_template_version="3.5" profile="21.05">
-    <macros>
-        <import>macros.xml</import>
-    </macros>
-    <requirements>
-        <container type="docker">ksuderman/stanza-nlp:@TOOL_VERSION@</container>
-    </requirements>
-    <version_command><![CDATA[
-python -c "import stanza; print(stanza.__version__)"
-    ]]></version_command>
-    <command detect_errors="exit_code"><![CDATA[
-    export HOME=\${TMPDIR:-/tmp} &&
-    python '$__tool_directory__/stanza_process.py'
-    --input '$input'
-    --output '$outputFile'
-    --lang '${language_model.fields.lang}'
-    --model-dir '${language_model.fields.models_path}'
-    --format '$format'
-    --annotators '$annotators'
-    ]]></command>
-    <inputs>
-        <param name="input" type="data" format="txt" label="Text"/>
-        <param name="language_model" type="select" label="Language Model">
-            <options from_data_table="stanza_models">
-                <column name="value" index="0"/>
-                <column name="name" index="1"/>
-                <column name="lang" index="2"/>
-                <column name="models_path" index="3"/>
-                <filter type="sort_by" column="1"/>
-            </options>
-        </param>
-        <param name="annotators" type="select" label="Annotation types">
-            <option value="tokenize" selected="true">Tokenization and sentence segmentation</option>
-            <option value="pos">Part of speech, lemmas, and morphological features</option>
-            <option value="ner">Named entity recognition</option>
-            <option value="parse">Dependency parsing</option>
-            <option value="sentiment">Sentiment analysis</option>
-            <option value="constituency">Constituency parsing</option>
-        </param>
-        <param name="format" type="select" label="Output format">
-            <option value="json" selected="true">JSON</option>
-            <option value="conll">CoNLL</option>
-            <option value="conllu">CoNLL-U</option>
-            <option value="text">Text</option>
-        </param>
-    </inputs>
-    <outputs>
-        <data name="outputFile" format="txt" label="${tool.name} (${annotators}) on ${on_string}">
-            <change_format>
-                <when input="format" value="json" format="json"/>
-                <when input="format" value="conllu" format="tabular"/>
-                <when input="format" value="conll" format="tabular"/>
-            </change_format>
-        </data>
-    </outputs>
-    <tests>
-        <test>
-            <param name="input" value="input.txt"/>
-            <param name="language_model" value="en"/>
-            <param name="annotators" value="tokenize"/>
-            <param name="format" value="json"/>
-            <output name="outputFile">
-                <assert_contents>
-                    <has_text text="&quot;text&quot;"/>
-                    <has_text text="&quot;tokens&quot;"/>
-                    <has_text text="John"/>
-                </assert_contents>
-            </output>
-        </test>
-        <test>
-            <param name="input" value="input.txt"/>
-            <param name="language_model" value="en"/>
-            <param name="annotators" value="ner"/>
-            <param name="format" value="json"/>
-            <output name="outputFile">
-                <assert_contents>
-                    <has_text text="&quot;entities&quot;"/>
-                    <has_text text="&quot;type&quot;"/>
-                </assert_contents>
-            </output>
-        </test>
-        <test>
-            <param name="input" value="input.txt"/>
-            <param name="language_model" value="en"/>
-            <param name="annotators" value="parse"/>
-            <param name="format" value="conllu"/>
-            <output name="outputFile">
-                <assert_contents>
-                    <has_text text="John"/>
-                    <has_text text="Smith"/>
-                </assert_contents>
-            </output>
-        </test>
-        <test>
-            <param name="input" value="input.txt"/>
-            <param name="language_model" value="en"/>
-            <param name="annotators" value="sentiment"/>
-            <param name="format" value="json"/>
-            <output name="outputFile">
-                <assert_contents>
-                    <has_text text="&quot;sentiment&quot;"/>
-                </assert_contents>
-            </output>
-        </test>
-    </tests>
-    <help><![CDATA[
-
-Stanza NLP
-==========
-
-Galaxy wrapper for the `Stanza <https://stanfordnlp.github.io/stanza/>`_ natural language
-processing toolkit from Stanford NLP Group. Stanza provides pretrained neural models
-supporting 80+ human languages.
-
-Annotation Types
-----------------
-
-Tokenization and sentence segmentation
-    Splits text into tokens and identifies sentence boundaries. Handles multi-word
-    token expansion for applicable languages.
-
-Part of speech, lemmas, and morphological features
-    Includes tokenization plus universal POS tagging (UPOS), treebank-specific POS
-    tagging (XPOS), lemmatization, and morphological feature analysis.
-
-Named entity recognition (NER)
-    Identifies named entities such as PERSON, ORG, GPE, DATE, etc. Available for
-    a subset of supported languages (8+ languages including English, Chinese, Spanish,
-    German, French, Dutch, Russian, and Ukrainian).
-
-Dependency parsing
-    Syntactic dependency parsing following Universal Dependencies annotation. Identifies
-    grammatical relationships (head and dependency relation) for each token.
-
-Sentiment analysis
-    Per-sentence sentiment scoring (0=negative, 1=neutral, 2=positive). Available for
-    languages with sentiment models.
-
-Constituency parsing
-    Phrase structure parse trees. Available for languages with constituency models.
-
-Output Formats
---------------
-
-**JSON**
-    Comprehensive structured output with all annotations. Best for programmatic access.
-
-**CoNLL**
-    Tab-separated format suitable for dependency parsing tasks.
-
-**CoNLL-U**
-    Universal Dependencies format with morphological features.
-
-**Text**
-    Human-readable text output with statistics and annotations.
-
-Language Models
----------------
-
-Stanza uses pretrained neural models organized by language. Models are downloaded and
-managed by the Stanza data manager. Each language may include models for different
-tasks (tokenization, POS, NER, etc.) trained on Universal Dependencies v2.12.
-
-Install models using the data manager (Admin > Local Data > Stanza Language Models).
-
-Supported Languages
--------------------
-
-Stanza supports 80+ languages including:
-
-- English, Spanish, German, French, Italian, Portuguese
-- Chinese, Japanese, Korean
-- Arabic, Hindi, Turkish
-- Russian, Ukrainian, Polish
-- Dutch, Greek, Swedish, Danish, Norwegian
-- And many more...
-
-See https://stanfordnlp.github.io/stanza/available_models.html for the complete list.
-
-    ]]></help>
-    <citations>
-        <citation type="bibtex">
-@inproceedings{qi2020stanza,
-  title={Stanza: A {P}ython Natural Language Processing Toolkit for Many Human Languages},
-  author={Qi, Peng and Zhang, Yuhao and Zhang, Yuhui and Bolton, Jason and Manning, Christopher D.},
-  booktitle={Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics: System Demonstrations},
-  year={2020},
-  url={https://stanfordnlp.github.io/stanza/}
-}
-        </citation>
-    </citations>
-</tool>
diff --git a/tools/stanza/galaxy_tools_stanza/stanza_process.py b/tools/stanza/galaxy_tools_stanza/stanza_process.py
deleted file mode 100644
index 738a79694b2..00000000000
--- a/tools/stanza/galaxy_tools_stanza/stanza_process.py
+++ /dev/null
@@ -1,230 +0,0 @@
-#!/usr/bin/env python
-"""
-Stanza NLP Processing Script for Galaxy
-
-Processes text files with Stanza and outputs results in various formats.
-"""
-
-import argparse
-import json
-import sys
-
-try:
-    import stanza
-except ImportError:
-    print("Error: Stanza is not installed. Please install stanza.", file=sys.stderr)
-    sys.exit(1)
-
-
-# Map annotator selections to Stanza processor strings
-PROCESSOR_MAP = {
-    "tokenize": "tokenize",
-    "pos": "tokenize,mwt,pos,lemma",
-    "ner": "tokenize,mwt,ner",
-    "parse": "tokenize,mwt,pos,lemma,depparse",
-    "sentiment": "tokenize,mwt,sentiment",
-    "constituency": "tokenize,mwt,pos,constituency",
-}
-
-
-def process_text(doc, output_format, annotator):
-    """Process a Stanza Document and format output."""
-    if output_format == "json":
-        return format_json(doc, annotator)
-    elif output_format == "conll":
-        return format_conll(doc)
-    elif output_format == "conllu":
-        return format_conllu(doc)
-    elif output_format == "text":
-        return format_text(doc, annotator)
-    else:
-        return format_json(doc, annotator)
-
-
-def format_json(doc, annotator):
-    """Format document as JSON."""
-    output = {"text": doc.text, "sentences": []}
-
-    for sent in doc.sentences:
-        sent_data = {"text": sent.text, "tokens": []}
-
-        for word in sent.words:
-            token_data = {
-                "text": word.text,
-                "start_char": word.start_char,
-                "end_char": word.end_char,
-            }
-
-            if annotator in ("pos", "parse", "constituency"):
-                token_data["upos"] = word.upos
-                token_data["xpos"] = word.xpos
-                token_data["lemma"] = word.lemma
-                if word.feats:
-                    token_data["feats"] = word.feats
-
-            if annotator == "parse":
-                token_data["deprel"] = word.deprel
-                token_data["head"] = word.head
-
-            sent_data["tokens"].append(token_data)
-
-        if annotator == "ner" and sent.ents:
-            sent_data["entities"] = [
-                {
-                    "text": ent.text,
-                    "type": ent.type,
-                    "start_char": ent.start_char,
-                    "end_char": ent.end_char,
-                }
-                for ent in sent.ents
-            ]
-
-        if annotator == "sentiment" and sent.sentiment is not None:
-            sent_data["sentiment"] = sent.sentiment
-
-        if annotator == "constituency" and sent.constituency is not None:
-            sent_data["constituency"] = str(sent.constituency)
-
-        output["sentences"].append(sent_data)
-
-    return json.dumps(output, indent=2, ensure_ascii=False)
-
-
-def format_conll(doc):
-    """Format document as CoNLL (tab-separated)."""
-    lines = []
-    for sent in doc.sentences:
-        for word in sent.words:
-            ner_tag = "O"
-            if hasattr(word, 'parent') and word.parent and hasattr(word.parent, 'ner'):
-                ner_tag = word.parent.ner if word.parent.ner else "O"
-            head = word.head if word.head is not None else 0
-            deprel = word.deprel if word.deprel else "_"
-            lemma = word.lemma if word.lemma else "_"
-            xpos = word.xpos if word.xpos else "_"
-
-            line = f"{word.id}\t{word.text}\t{lemma}\t{xpos}\t{ner_tag}\t{head}\t{deprel}"
-            lines.append(line)
-        lines.append("")
-    return "\n".join(lines)
-
-
-def format_conllu(doc):
-    """Format document as CoNLL-U (Universal Dependencies format)."""
-    lines = []
-    for sent in doc.sentences:
-        for word in sent.words:
-            upos = word.upos if word.upos else "_"
-            xpos = word.xpos if word.xpos else "_"
-            lemma = word.lemma if word.lemma else "_"
-            feats = word.feats if word.feats else "_"
-            head = word.head if word.head is not None else 0
-            deprel = word.deprel if word.deprel else "_"
-
-            line = f"{word.id}\t{word.text}\t{lemma}\t{upos}\t{xpos}\t{feats}\t{head}\t{deprel}\t_\t_"
-            lines.append(line)
-        lines.append("")
-    return "\n".join(lines)
-
-
-def format_text(doc, annotator):
-    """Format document as human-readable text."""
-    lines = []
-
-    num_tokens = sum(len(sent.words) for sent in doc.sentences)
-    num_sents = len(doc.sentences)
-    lines.append(f"Document Statistics: {num_sents} sentences, {num_tokens} tokens\n")
-
-    for i, sent in enumerate(doc.sentences, 1):
-        lines.append(f"\nSentence #{i} ({len(sent.words)} tokens):")
-        lines.append(sent.text)
-        lines.append("")
-
-        if annotator in ("pos", "parse", "constituency"):
-            for word in sent.words:
-                parts = [f"  {word.text}"]
-                parts.append(f"lemma={word.lemma}")
-                parts.append(f"upos={word.upos}")
-                if word.xpos:
-                    parts.append(f"xpos={word.xpos}")
-                if annotator == "parse" and word.deprel:
-                    parts.append(f"deprel={word.deprel}")
-                    parts.append(f"head={word.head}")
-                lines.append(" | ".join(parts))
-            lines.append("")
-
-        if annotator == "ner" and sent.ents:
-            lines.append("  Named Entities:")
-            for ent in sent.ents:
-                lines.append(f"    {ent.text} ({ent.type})")
-            lines.append("")
-
-        if annotator == "sentiment" and sent.sentiment is not None:
-            labels = {0: "negative", 1: "neutral", 2: "positive"}
-            lines.append(f"  Sentiment: {labels.get(sent.sentiment, sent.sentiment)}")
-            lines.append("")
-
-        if annotator == "constituency" and sent.constituency is not None:
-            lines.append(f"  Constituency: {sent.constituency}")
-            lines.append("")
-
-    return "\n".join(lines)
-
-
-def main():
-    parser = argparse.ArgumentParser(description="Process text with Stanza NLP")
-    parser.add_argument("--input", required=True, help="Input text file")
-    parser.add_argument("--output", required=True, help="Output file")
-    parser.add_argument("--lang", required=True, help="Language code")
-    parser.add_argument("--model-dir", required=True, help="Path to stanza_resources directory")
-    parser.add_argument("--format", choices=["json", "conll", "conllu", "text"],
-                        default="json", help="Output format")
-    parser.add_argument("--annotators", required=True, help="Annotation type")
-
-    args = parser.parse_args()
-
-    processors = PROCESSOR_MAP.get(args.annotators, "tokenize")
-
-    # Load Stanza pipeline using default_fast package (nocharlm) for lower memory usage
-    try:
-        nlp = stanza.Pipeline(
-            lang=args.lang,
-            dir=args.model_dir,
-            processors=processors,
-            package="default_fast",
-            download_method=None,
-            use_gpu=False,
-        )
-    except Exception as e:
-        print(f"Error loading Stanza pipeline: {e}", file=sys.stderr)
-        sys.exit(1)
-
-    # Read input text
-    try:
-        with open(args.input, 'r', encoding='utf-8') as f:
-            text = f.read()
-    except Exception as e:
-        print(f"Error reading input file: {e}", file=sys.stderr)
-        sys.exit(1)
-
-    # Process text
-    try:
-        doc = nlp(text)
-    except Exception as e:
-        print(f"Error processing text: {e}", file=sys.stderr)
-        sys.exit(1)
-
-    # Format and write output
-    try:
-        output = process_text(doc, args.format, args.annotators)
-        with open(args.output, 'w', encoding='utf-8') as f:
-            f.write(output)
-    except Exception as e:
-        print(f"Error writing output: {e}", file=sys.stderr)
-        sys.exit(1)
-
-    print(f"Successfully processed {len(text)} characters")
-
-
-if __name__ == "__main__":
-    main()
diff --git a/tools/stanza/galaxy_tools_stanza/test-data/input.txt b/tools/stanza/galaxy_tools_stanza/test-data/input.txt
deleted file mode 100644
index 7cea21fac4e..00000000000
--- a/tools/stanza/galaxy_tools_stanza/test-data/input.txt
+++ /dev/null
@@ -1,2 +0,0 @@
-John Smith went to Walmart on January 1, 1970 to buy IBM stock, then he went to the theater.
-
diff --git a/tools/stanza/galaxy_tools_stanza/test-data/stanza_models.loc b/tools/stanza/galaxy_tools_stanza/test-data/stanza_models.loc
deleted file mode 100644
index 215f6241d01..00000000000
--- a/tools/stanza/galaxy_tools_stanza/test-data/stanza_models.loc
+++ /dev/null
@@ -1 +0,0 @@
-en	English	en	/Users/suderman/Library/Caches/stanza/1.11.0/resources
diff --git a/tools/stanza/galaxy_tools_stanza/tool-data/stanza_models.loc.sample b/tools/stanza/galaxy_tools_stanza/tool-data/stanza_models.loc.sample
deleted file mode 100644
index 2a70fbefe88..00000000000
--- a/tools/stanza/galaxy_tools_stanza/tool-data/stanza_models.loc.sample
+++ /dev/null
@@ -1,10 +0,0 @@
-# Stanza language models
-# This file is maintained by the stanza_models data manager.
-#
-# Columns:
-# <value>	<name>	<lang>	<models_path>
-#
-# value: unique identifier for this model entry (language code)
-# name: display name shown in the tool UI
-# lang: ISO 639-1 language code
-# models_path: path to the stanza_resources directory containing the model
diff --git a/tools/stanza/galaxy_tools_stanza/tool_data_table_conf.xml.sample b/tools/stanza/galaxy_tools_stanza/tool_data_table_conf.xml.sample
deleted file mode 100644
index c9c90863118..00000000000
--- a/tools/stanza/galaxy_tools_stanza/tool_data_table_conf.xml.sample
+++ /dev/null
@@ -1,6 +0,0 @@
-<tables>
-    <table name="stanza_models" comment_char="#">
-        <columns>value, name, lang, models_path</columns>
-        <file path="tool-data/stanza_models.loc" />
-    </table>
-</tables>
diff --git a/tools/stanza/galaxy_tools_stanza/tool_data_table_conf.xml.test b/tools/stanza/galaxy_tools_stanza/tool_data_table_conf.xml.test
deleted file mode 100644
index 72e4b02a577..00000000000
--- a/tools/stanza/galaxy_tools_stanza/tool_data_table_conf.xml.test
+++ /dev/null
@@ -1,6 +0,0 @@
-<tables>
-    <table name="stanza_models" comment_char="#">
-        <columns>value, name, lang, models_path</columns>
-        <file path="${__HERE__}/test-data/stanza_models.loc" />
-    </table>
-</tables>

From cc8a9191c32eb25df62d69a961f396133a059a77 Mon Sep 17 00:00:00 2001
From: Keith Suderman <suderman@jhu.edu>
Date: Wed, 20 May 2026 13:03:46 -0400
Subject: [PATCH 5/6] Addressed review comments

Co-Authored-By: Claude Sonnet 4 <noreply@anthropic.com>
---
 data_managers/data_manager_stanza_models/.shed.yml | 2 +-
 tools/stanza/.shed.yml                             | 2 +-
 tools/stanza/macros.xml                            | 4 ----
 tools/stanza/stanza_nlp.xml                        | 7 ++-----
 tools/stanza/stanza_process.py                     | 4 ++++
 5 files changed, 8 insertions(+), 11 deletions(-)
 delete mode 100644 tools/stanza/macros.xml

diff --git a/data_managers/data_manager_stanza_models/.shed.yml b/data_managers/data_manager_stanza_models/.shed.yml
index 6fbd72a1a47..b99033951f2 100644
--- a/data_managers/data_manager_stanza_models/.shed.yml
+++ b/data_managers/data_manager_stanza_models/.shed.yml
@@ -7,7 +7,7 @@ long_description: |
   languages with models for tokenization, POS tagging, lemmatization, dependency
   parsing, NER, sentiment analysis, and constituency parsing.
 homepage_url: https://stanfordnlp.github.io/stanza/
-remote_repository_url: https://github.com/ksuderman/data_manager_stanza
+remote_repository_url: https://github.com/galaxyproject/tools-iuc
 type: unrestricted
 categories:
   - Data Managers
diff --git a/tools/stanza/.shed.yml b/tools/stanza/.shed.yml
index 797bdfc7507..b977a3f4937 100644
--- a/tools/stanza/.shed.yml
+++ b/tools/stanza/.shed.yml
@@ -7,7 +7,7 @@ long_description: |
   POS tagging, lemmatization, dependency parsing, named entity recognition, sentiment analysis,
   and constituency parsing.
 homepage_url: https://stanfordnlp.github.io/stanza/
-remote_repository_url: https://github.com/ksuderman/galaxy_tools_stanza
+remote_repository_url: https://github.com/galaxyproject/tools-iuc
 type: unrestricted
 categories:
   - Text Manipulation
diff --git a/tools/stanza/macros.xml b/tools/stanza/macros.xml
deleted file mode 100644
index f58769bae21..00000000000
--- a/tools/stanza/macros.xml
+++ /dev/null
@@ -1,4 +0,0 @@
-<macros>
-    <token name="@TOOL_VERSION@">1.11.1</token>
-    <token name="@VERSION_SUFFIX@">4</token>
-</macros>
diff --git a/tools/stanza/stanza_nlp.xml b/tools/stanza/stanza_nlp.xml
index 6b9f84c18ad..b29f10205ef 100644
--- a/tools/stanza/stanza_nlp.xml
+++ b/tools/stanza/stanza_nlp.xml
@@ -1,9 +1,6 @@
-<tool id="stanza_nlp" name="Stanza NLP Annotators" version="@TOOL_VERSION@+galaxy@VERSION_SUFFIX@" python_template_version="3.5" profile="21.05">
-    <macros>
-        <import>macros.xml</import>
-    </macros>
+<tool id="stanza_nlp" name="Stanza NLP Annotators" version="1.11.1+galaxy4" profile="24.1">
     <requirements>
-        <container type="docker">ksuderman/stanza-nlp:@TOOL_VERSION@</container>
+        <container type="docker">ksuderman/stanza-nlp:1.11.1</container>
     </requirements>
     <version_command><![CDATA[
 python -c "import stanza; print(stanza.__version__)"
diff --git a/tools/stanza/stanza_process.py b/tools/stanza/stanza_process.py
index 738a79694b2..b9e691d9888 100644
--- a/tools/stanza/stanza_process.py
+++ b/tools/stanza/stanza_process.py
@@ -1,8 +1,12 @@
 #!/usr/bin/env python
+# Copyright 2006 The Galaxy Project. All rights reserved.
 """
 Stanza NLP Processing Script for Galaxy
 
 Processes text files with Stanza and outputs results in various formats.
+
+Author: Keith Suderman
+License: MIT
 """
 
 import argparse

From d92a4ee92510b709cc9e707c8b0de2dfa1850349 Mon Sep 17 00:00:00 2001
From: Keith Suderman <suderman@jhu.edu>
Date: Wed, 20 May 2026 13:59:33 -0400
Subject: [PATCH 6/6] Fixed macro inlining for Stanza tool

Co-Authored-By: Claude Sonnet 4 <noreply@anthropic.com>
---
 tools/stanza/stanza_nlp.xml | 8 ++++++--
 1 file changed, 6 insertions(+), 2 deletions(-)

diff --git a/tools/stanza/stanza_nlp.xml b/tools/stanza/stanza_nlp.xml
index b29f10205ef..328c076b070 100644
--- a/tools/stanza/stanza_nlp.xml
+++ b/tools/stanza/stanza_nlp.xml
@@ -1,6 +1,10 @@
-<tool id="stanza_nlp" name="Stanza NLP Annotators" version="1.11.1+galaxy4" profile="24.1">
+<tool id="stanza_nlp" name="Stanza NLP Annotators" version="@TOOL_VERSION@+galaxy@VERSION_SUFFIX@" profile="24.1">
+    <macros>
+        <token name="@TOOL_VERSION@">1.11.1</token>
+        <token name="@VERSION_SUFFIX@">4</token>
+    </macros>
     <requirements>
-        <container type="docker">ksuderman/stanza-nlp:1.11.1</container>
+        <container type="docker">ksuderman/stanza-nlp:@TOOL_VERSION@</container>
     </requirements>
     <version_command><![CDATA[
 python -c "import stanza; print(stanza.__version__)"