Skip to content

Add Stanford CoreNLP tool and data manager#8005

Open
ksuderman wants to merge 2 commits into
galaxyproject:mainfrom
ksuderman:corenlp-nlp
Open

Add Stanford CoreNLP tool and data manager#8005
ksuderman wants to merge 2 commits into
galaxyproject:mainfrom
ksuderman:corenlp-nlp

Conversation

@ksuderman
Copy link
Copy Markdown

Summary

  • Adds Stanford CoreNLP annotation tool supporting 8 languages
  • Includes data manager for downloading CoreNLP language model JARs from Maven Central
  • Provides tokenization, POS tagging, lemmatization, dependency parsing, NER, coreference resolution
  • Multiple output formats: JSON, CoNLL, XML, tabular, text

CoreNLP Tool Features

  • Docker containerization with Java 21 runtime environment
  • Support for Arabic, Chinese, English, French, German, Hungarian, Italian, Spanish
  • Sentiment analysis with EJML dependencies included
  • Coreference resolution with common models JAR support
  • Flexible annotator pipeline configuration

Data Manager Features

  • Downloads language model JARs directly from Maven Central
  • Multi-select installation interface for language packages
  • Automatic registration in Galaxy data tables
  • Common models JAR for coreference resolution support

Test plan

  • Tool passes planemo lint validation
  • Data manager passes planemo lint validation
  • Comprehensive test data included
  • README documentation provided for both components
  • .shed.yml configured for IUC submission
  • Docker image updated and tested

🤖 Generated with Claude Code

ksuderman and others added 2 commits May 19, 2026 18:49
- Stanford CoreNLP tool wrapper supporting 8 languages
- Multi-language support: English, Chinese, Arabic, French, German, Spanish, Italian, Hungarian
- Multiple annotators: tokenization, POS, NER, parsing, coreference, sentiment
- Output formats: JSON, CoNLL, CoNLL-U, XML, text
- Docker containerization with Java 21
- Data manager for automated model downloads from Maven Central
- Comprehensive tests and documentation
- Fixed coreference resolution with common models JAR

Tool: stanford_corenlp (v4.5.10+galaxy4)
Data Manager: data_manager_corenlp_models
Categories: Text Manipulation, Natural Language Processing
Co-Authored-By: Claude Sonnet 4 <noreply@anthropic.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant