Skip to content

Add spaCy NLP tool and data manager#8006

Open
ksuderman wants to merge 2 commits into
galaxyproject:mainfrom
ksuderman:spacy-nlp
Open

Add spaCy NLP tool and data manager#8006
ksuderman wants to merge 2 commits into
galaxyproject:mainfrom
ksuderman:spacy-nlp

Conversation

@ksuderman
Copy link
Copy Markdown

Summary

  • Adds spaCy NLP annotation tool supporting 70+ models across 25+ languages
  • Includes data manager for downloading and installing spaCy language models
  • Provides tokenization, POS tagging, lemmatization, dependency parsing, NER
  • Multiple output formats: JSON, CoNLL, tabular, text

Test plan

  • Tool passes planemo lint validation
  • Data manager passes planemo lint validation
  • Comprehensive test data included
  • README documentation provided for both components
  • Model installation tested across multiple languages

🤖 Generated with Claude Code

ksuderman and others added 2 commits May 19, 2026 18:50
- spaCy tool wrapper for fast, production-ready NLP
- 25+ languages with multiple model sizes (sm/md/lg/trf)
- Core annotations: tokenization, POS, NER, dependency parsing
- Enhanced JSON output with is_alpha, is_stop attributes
- Python 3.12 compatible
- Data manager for automated model downloads via pip
- 70+ models across 25+ languages
- Comprehensive tests with planemo validation
- Memory efficient and optimized for speed

Tool: spacy_nlp (v3.8.11+galaxy4)
Data Manager: data_manager_spacy_models
Categories: Text Manipulation, Natural Language Processing
Co-Authored-By: Claude Sonnet 4 <noreply@anthropic.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant