Tweets classification - EPFL CS-433

This repo contains code and instructions necessary to classify tweets as containing ':)' or ':('. The corresponding kaggle competition was part of CS-433 Machine learning class from EPFL. Our team is Martian Jaggirnauts

Directory Tree description

data folder which should be populated as decribed below

slang_dict_parsing contains code which scrapped noslang website for slang words but did not result in accuracy improvements so it is not used

src folder containing the main code as run.py and the models.

templates_course the code provided by default in the project

Design decisions

How to run the project and TRAIN the models

*nix friendly guide. For other platforms some steps might differ

Running time The current model took around 12-hours to train on a 8-core CPU, 60GB of RAM and a Tesla K80 GPU. The GPU is highly recommended.

Clone this repo

$ git clone https://github.com/m-doru/tweets-sentiment-analysis.git
$ cd tweets-sentiment-analysis

Install fastText v0.1.0 with build for Python. This should be possible after this step:

$ python3
>> import fasttext
>>

Clone sent2vec at the root directory of the project. Follow the Setup&Requirments to compile it. Then download the sent2vec_twitter_bigrams 23GB (700dim, trained on english tweets) v1 embeddings and place them in data/
Download Glove Twitter pretrained word-vectors glove.twitter.27B.zip. Unzip file and place glove.twitter.27B.200d.txt in data/glove/
Download the data from the kaggle competition and place the .txt files in data/twitter-datasets/.
Install the following python requirements:

scikit-learn
keras with tensorflow backend

How to run the project to get the pretrained model the kaggle submission

Clone this repo

$ git clone https://github.com/m-doru/tweets-sentiment-analysis.git
$ cd tweets-sentiment-analysis

Download the data from the kaggle competition and place the .txt files in data/twitter-datasets/.
Install the following python3 requirements:

scikit-learn

Run run_pretrained.py

Name		Name	Last commit message	Last commit date
Latest commit History 6 Commits
data/twitter-datasets		data/twitter-datasets
latex-example-paper		latex-example-paper
slang_dict_parsing		slang_dict_parsing
src		src
templates_course		templates_course
README.md		README.md
pred_pickles.zip		pred_pickles.zip
task.md		task.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Tweets classification - EPFL CS-433

Directory Tree description

Design decisions

How to run the project and TRAIN the models

*nix friendly guide. For other platforms some steps might differ

How to run the project to get the pretrained model the kaggle submission

About

Releases

Packages

Contributors 2

Languages

m-doru/tweets-binary-emoji-prediction

Folders and files

Latest commit

History

Repository files navigation

Tweets classification - EPFL CS-433

Directory Tree description

Design decisions

How to run the project and TRAIN the models

*nix friendly guide. For other platforms some steps might differ

How to run the project to get the pretrained model the kaggle submission

About

Topics

Resources

Stars

Watchers

Forks

Releases

Packages 0

Contributors 2

Languages

Packages