Skip to content

Add VADER sentiment analysis tool#8001

Open
ksuderman wants to merge 5 commits into
galaxyproject:mainfrom
ksuderman:vader-sentiment
Open

Add VADER sentiment analysis tool#8001
ksuderman wants to merge 5 commits into
galaxyproject:mainfrom
ksuderman:vader-sentiment

Conversation

@ksuderman
Copy link
Copy Markdown

Summary

This PR adds the VADER (Valence Aware Dictionary and sEntiment Reasoner) sentiment analysis tool to Galaxy, providing a lexicon and rule-based approach specifically optimized for social media and informal text.

Tool Overview

VADER Sentiment Analysis (vader_sentiment)

  • Version: v3.3.2+galaxy0
  • Categories: Text Manipulation, Natural Language Processing
  • License: MIT (OSI-compatible)
  • Dependencies: None - bundled lexicon included

Key Features

  • Social media optimized: Handles slang, emoticons, punctuation emphasis
  • Rule-based scoring: Interpretable, deterministic results
  • No external dependencies: Bundled VADER lexicon (MIT licensed)
  • Fast processing: Efficient for large-scale text analysis
  • Flexible granularity: Per-sentence or whole-document analysis
  • Multiple outputs: TSV with classification + JSON format

Why VADER?

VADER excels for:

  • Political speeches and debates
  • News articles and opinion text
  • Social media content analysis
  • Cases where interpretable, rule-based scores are preferred over neural models
  • High-volume processing without model downloads

Validation

  • Planemo lint: Passes without issues
  • Tests: 2 comprehensive tests included
  • Documentation: Complete README with usage examples
  • Citations: Proper citation for Hutto & Gilbert (2014)
  • IUC compliance: Follows all IUC best practices

Tool Details

Input/Output

  • Input: Plain text files
  • Output: TSV (compound, positive, negative, neutral scores + classification) or JSON

Scoring System

  • Compound score (-1 to +1): Primary sentiment metric
  • Component scores (0 to 1): Positive, negative, neutral proportions
  • Classification: Positive (≥+0.05), Negative (≤-0.05), Neutral (-0.05 to +0.05)

Files Included

tools/vader_sentiment/
├── .shed.yml                    # IUC toolshed configuration
├── README.md                    # Comprehensive documentation  
├── vader_sentiment.xml          # Galaxy tool wrapper
├── vader_process.py            # Processing script
├── vaderSentiment.py           # VADER implementation
├── vader_lexicon.txt           # Core sentiment lexicon
├── emoji_utf8_lexicon.txt      # Emoji sentiment mapping
├── macros.xml                  # Tool version macros
└── test-data/
    └── input.txt               # Test input file

Citation

Hutto, C.J. & Gilbert, Eric. (2014). VADER: A Parsimonious Rule-based Model for 
Sentiment Analysis of Social Media Text. Proceedings of the Eighth International 
AAAI Conference on Weblogs and Social Media.

This tool provides a valuable alternative to neural sentiment approaches, especially for research requiring interpretable, reproducible sentiment scores on informal text.

🤖 Generated with Claude Code

Copy link
Copy Markdown
Member

@bgruening bgruening left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks a lot @ksuderman.

There seem to be two similar/identical tools in this PR, can you please double check.

Comment thread tools/vader/macros.xml Outdated
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

you could move this macro into the main XML if its only those two items

Comment thread tools/vader/.shed.yml Outdated
a rule-based model optimized for social media and general text. Works well on political
speeches, news articles, and opinion text. No language models required.
homepage_url: https://github.com/cjhutto/vaderSentiment
remote_repository_url: https://github.com/ksuderman/galaxy_tools_vader
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this one needs adoption to IUC

Comment thread tools/vader/vader_lexicon.txt
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should we put those python scripts and the dictionaries maybe into a very simple conda package?

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@ksuderman ping ...

I also found this conda package already: https://anaconda.org/channels/conda-forge/packages/vadersentiment/files is that useful?

Comment thread tools/vader/vader_sentiment.xml Outdated
Comment thread tools/vader/vader_sentiment.xml
Comment thread tools/vader/vader_sentiment.xml Outdated
Comment thread tools/vader/vader_sentiment.xml Outdated
Comment thread tools/vader_sentiment/.shed.yml Outdated
Comment thread tools/vader_sentiment/vader_sentiment.xml Outdated
@ksuderman
Copy link
Copy Markdown
Author

Thanks a lot @ksuderman.

There seem to be two similar/identical tools in this PR, can you please double check.

Yes, Claude started getting confused and I thought it would be easier to unwind once everything was in Git. I will be going through all of the repos and cleaning out duplicate files etc.

ksuderman and others added 2 commits May 20, 2026 11:31
- Rule-based sentiment analysis optimized for social media text
- No language models required - uses bundled lexicon
- Supports JSON and tabular output formats
- Includes comprehensive test data and documentation

Co-Authored-By: Claude Sonnet 4 <noreply@anthropic.com>
Keep only tools/vader/ to resolve duplicate tools issue
@ksuderman
Copy link
Copy Markdown
Author

Fixed the duplicate tools issue. The problem was two identical directories: tools/vader/ and tools/vader_sentiment/. I've removed the duplicate tools/vader_sentiment/ directory and cleaned up the commit history.

The tool now has a single clean directory structure under tools/vader/ with all the necessary files:

  • Galaxy tool wrapper (vader_sentiment.xml)
  • Processing script (vader_process.py)
  • Bundled VADER library (vaderSentiment.py)
  • Lexicon files (vader_lexicon.txt, emoji_utf8_lexicon.txt)

ksuderman added 2 commits May 20, 2026 11:50
- Update profile from 21.05 to 24.1
- Remove macros.xml and inline version
- Add ftype attributes to test outputs
- Add has_n_rows assertion for tabular test
- Use strict JSON assertions for JSON test
- Improve test robustness
@ksuderman
Copy link
Copy Markdown
Author

Addressed review comments

Co-Authored-By: Claude Sonnet 4 <noreply@anthropic.com>
Comment thread tools/vader/vader_sentiment.xml

]]></help>
<citations>
<citation type="bibtex">
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@ksuderman is it this DOI 10.1609/icwsm.v8i1.14550 ...

Suggested change
<citation type="bibtex">
<citation type="doi">10.1609/icwsm.v8i1.14550

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@ksuderman ping ...

I also found this conda package already: https://anaconda.org/channels/conda-forge/packages/vadersentiment/files is that useful?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants