Skip to content

Commit fe532a1

Browse files
ksudermanclaude
andcommitted
Add VADER sentiment analysis tool
- Rule-based sentiment analysis optimized for social media text - No language models required - uses bundled lexicon - Supports JSON and tabular output formats - Includes comprehensive test data and documentation Co-Authored-By: Claude Sonnet 4 <noreply@anthropic.com>
1 parent b25ebf7 commit fe532a1

18 files changed

Lines changed: 24448 additions & 0 deletions

tools/vader/.shed.yml

Lines changed: 13 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,13 @@
1+
name: vader_sentiment
2+
owner: iuc
3+
description: VADER sentiment analysis for text
4+
long_description: |
5+
Performs sentiment analysis using VADER (Valence Aware Dictionary and sEntiment Reasoner),
6+
a rule-based model optimized for social media and general text. Works well on political
7+
speeches, news articles, and opinion text. No language models required.
8+
homepage_url: https://github.com/cjhutto/vaderSentiment
9+
remote_repository_url: https://github.com/ksuderman/galaxy_tools_vader
10+
type: unrestricted
11+
categories:
12+
- Text Manipulation
13+
- Natural Language Processing

tools/vader/README.md

Lines changed: 143 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,143 @@
1+
# Galaxy Wrapper for VADER Sentiment Analysis
2+
3+
This Galaxy tool performs sentiment analysis using VADER (Valence Aware Dictionary and sEntiment Reasoner), a lexicon and rule-based sentiment analysis tool specifically designed for social media and informal text.
4+
5+
## Features
6+
7+
- **Social media optimized**: Handles slang, emoticons, punctuation emphasis, and informal language
8+
- **No dependencies**: Bundled lexicon with no external model downloads required
9+
- **Fast processing**: Rule-based approach for high-speed sentiment analysis
10+
- **Interpretable scores**: Clear numerical scores with rule-based explanations
11+
- **Flexible granularity**: Per-sentence or whole-document analysis
12+
- **Dual output formats**: TSV for analysis and JSON for programmatic use
13+
14+
## Requirements
15+
16+
- **Input**: Plain text files
17+
- **No dependencies**: Pure Python implementation with bundled VADER lexicon
18+
- **No setup required**: Ready to use immediately after installation
19+
20+
## When to Use VADER
21+
22+
VADER excels at analyzing:
23+
24+
| Text Type | Why VADER Works Well |
25+
|---|---|
26+
| **Political speeches** | Handles rhetorical language, emphasis, and persuasive tone |
27+
| **News articles** | Captures positive/negative framing and editorial stance |
28+
| **Social media posts** | Originally designed for Twitter, Facebook, and informal text |
29+
| **Opinion pieces** | Effective on subjective, evaluative text |
30+
| **Short informal text** | Handles slang, emoticons, and unconventional punctuation |
31+
32+
## VADER vs. Model-Based Approaches
33+
34+
Choose VADER over neural sentiment models (spaCy, Stanza) when:
35+
36+
-**Interpretability**: Need rule-based, explainable sentiment scores
37+
-**Speed**: Processing large volumes of text quickly
38+
-**No models**: Don't want to download large language models
39+
-**Social media**: Analyzing informal, social media-style text
40+
-**Lightweight**: Minimal computational requirements
41+
42+
Choose neural models for:
43+
- 📚 **Formal text**: Academic papers, literature, formal documents
44+
- 🎭 **Complex sentiment**: Subtle, contextual, or sarcastic expressions
45+
- 🌍 **Multilingual**: Non-English text analysis
46+
47+
## Sentiment Scores
48+
49+
VADER produces four scores for each text unit:
50+
51+
### Compound Score (-1 to +1)
52+
**Primary metric**: Overall sentiment intensity
53+
- `≥ +0.05`: **Positive** sentiment
54+
- `≤ -0.05`: **Negative** sentiment
55+
- `-0.05 to +0.05`: **Neutral** sentiment
56+
57+
### Component Scores (0 to 1)
58+
- **Positive**: Proportion of positive sentiment
59+
- **Negative**: Proportion of negative sentiment
60+
- **Neutral**: Proportion of neutral sentiment
61+
62+
*Note: Positive + Negative + Neutral = 1.0*
63+
64+
## Analysis Granularity
65+
66+
### Per Sentence (Default)
67+
- Splits text into sentences automatically
68+
- Scores each sentence individually
69+
- Produces table with one row per sentence
70+
- Best for: Tracking sentiment changes throughout a document
71+
72+
### Whole Document
73+
- Scores entire text as single unit
74+
- Single sentiment score for the complete document
75+
- Best for: Overall document sentiment classification
76+
77+
## Output Formats
78+
79+
### Tabular (TSV)
80+
Tab-separated file with columns:
81+
- `sentence`: Text of the analyzed unit
82+
- `compound`: Overall sentiment score (-1 to +1)
83+
- `positive`: Positive proportion (0 to 1)
84+
- `negative`: Negative proportion (0 to 1)
85+
- `neutral`: Neutral proportion (0 to 1)
86+
- `label`: Classification (positive/negative/neutral)
87+
88+
### JSON
89+
Structured JSON format:
90+
```json
91+
[
92+
{
93+
"text": "This movie is great!",
94+
"compound": 0.6249,
95+
"positive": 0.779,
96+
"negative": 0.0,
97+
"neutral": 0.221,
98+
"label": "positive"
99+
}
100+
]
101+
```
102+
103+
## Example Use Cases
104+
105+
- **Political analysis**: Sentiment tracking in campaign speeches and debates
106+
- **Media monitoring**: News sentiment analysis and bias detection
107+
- **Social media research**: Public opinion analysis on Twitter/Facebook
108+
- **Customer feedback**: Product review sentiment classification
109+
- **Historical text analysis**: Sentiment trends in historical documents
110+
- **Content moderation**: Identifying negative sentiment in user-generated content
111+
112+
## VADER's Strengths
113+
114+
- **Punctuation awareness**: "Good!!!" is more positive than "Good"
115+
- **Capitalization**: "AMAZING" is stronger than "amazing"
116+
- **Degree modifiers**: "very good" vs "slightly good"
117+
- **Conjunction handling**: "but" and "however" shift sentiment
118+
- **Emoticon support**: :) :( :D are properly interpreted
119+
- **Slang recognition**: Modern informal expressions
120+
121+
## Installation
122+
123+
Install this tool from the Galaxy Toolshed: `vader_sentiment`
124+
125+
No additional setup required - the VADER lexicon is bundled with the tool.
126+
127+
## Citation
128+
129+
If you use this tool, please cite the original VADER paper:
130+
131+
```
132+
Hutto, C.J. & Gilbert, Eric. (2014). VADER: A Parsimonious Rule-based Model for
133+
Sentiment Analysis of Social Media Text. Proceedings of the Eighth International
134+
AAAI Conference on Weblogs and Social Media.
135+
```
136+
137+
## License
138+
139+
VADER is released under the MIT License, allowing unrestricted use in research and commercial applications.
140+
141+
## Version History
142+
143+
- **3.3.2+galaxy0**: Initial Galaxy release with bundled lexicon and dual output formats

0 commit comments

Comments
 (0)