Hatespeech-Classification-CNN

Performing hatespeech classification with a Convulotional Neural Network model. Comparing with logistic regression and simple version of XGBoost.

In order to run the project with the premade vectors from GloVe, you need to do the following:

Download the file containing the pretrained tweet vectors from https://nlp.stanford.edu/projects/glove/
Unzip the file and convert the files containing the keyed vector to a correct format using the following terminal commands: python -m gensim.scripts.glove2word2vec --input [filename25d] --output glove-twitter-25.txt python -m gensim.scripts.glove2word2vec --input [filename50d] --output glove-twitter-50.txt python -m gensim.scripts.glove2word2vec --input [filename100d] --output glove-twitter-100.txt python -m gensim.scripts.glove2word2vec --input [filename200d] --output glove-twitter-200.txt
Place the files you have retrieved in the folder "glove.twitter.27B". You should be good to go.

The file "data_merged.csv" contains all the data, which is preprocessed and partitioned when a model file is run.

To run a model, simply type "python [modelfilename.py]" (example: python cnn.py). Adding arguments are possible.

The following arguments exist.

For cnn.py:

- -regval : the regularizing l2-value.
- -activationfunction: which activation function to use in the intermediate neurones. Suggestions can be "relu" or "tanh"
- -use_pretrained_vecs: True or False. If true, the one uses the glove vectors for the training. Default is False.
- -emb_dim: default is 200, but can also (when using pretrained vectors) be 25, 50 and 100. Can be any number of dimensions when not using pretrained vectors.

For xgboost_1.py:

- -seed: the random state seed for partitioning the data. Is an integer. Default 123
- -use_pretrained_vecs: True or False. If true, the one uses the glove vectors for the training. Default is False.
- -emb_dim: default is 200, but can also (when using pretrained vectors) be 25, 50 and 100. Can be any number of dimensions when not using pretrained vectors.

Name		Name	Last commit message	Last commit date
Latest commit History 22 Commits
Code		Code
.gitignore		.gitignore
README.md		README.md
TDDE16Project.pdf		TDDE16Project.pdf

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Hatespeech-Classification-CNN

About

Releases

Packages

Languages

Filco306/Hatespeech-Classification-CNN

Folders and files

Latest commit

History

Repository files navigation

Hatespeech-Classification-CNN

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages