This is the repository for an undergraduate research project that I have done in the summer of 2014.
To acquire your own copy of this implementation follow these instructions. The following instructions are written for Ubuntu 14.04 GNU/Linux, however it should be able to run on any Unix-like system.
The source code is hosted on github.com here. Download the source code either via git or a zip file
wget https://github.com/hockeybuggy/twitter-sentiment/archive/dist.zip
unzip dist.zip
cd twitter-sentiment-dist
If python 2.7, pip, and flex are not installed, they can be installed easily via a package manger. Then the various required python packages can be installed with pip:
sudo apt-get install python2.7 pip flex
sudo pip install numpy nltk
The tokenizer is written in C for performance reasons and can be compiled with:
make
The classifier can be executed in a number of ways. There are two python programs that are executable: 'conductor.py' and 'crossval.py'. These two programs run the classifier in a simple fashion or with cross validation. There are a large number of command line arguments that can be viewed with:
python conductor -h
Since the command line arguments can be cumbersome many of the experiments can be run via the make file:
make tiny # Quick running with only 100 examples
make crossfolds-m000 # The MaxEnt baseline with cross-validation
make crossfolds-m111 # MaxEnt with cross-validation & feature selection
make crossfolds-b000 # The Naive Bayes baseline with cross-validation
make crossfolds-b111 # Naive Bayes with cross-val & feature selection
make crossfolds # Run all of the cross-validation experiments