Subreddit Classifier

Setup

To run, you must install the following dependencies:

pip install -U scikit-learn
pip install praw
pip install joblib
pip install nltk

Also, to ensure the natural language toolkit has access to the english stop words, make sure you run:

nltk.download('stopwords')

In order to use predict_subreddit.py or train_and_evaluate.py, the "models" folder should be populated with each of the four models.

In order to use find_hyperparameters.py, a "temp" folder must be made.

Usage

In order to simply test a classifier on a random reddit post, run:

python predict_subreddit.py

The --url flag can be used to specify a url of a post to predict.

Run the following to get more usage options:

python predict_subreddit.py -h

In order to train or evaluate a specific model, use train_and_evaluate.py. Run the following to get more usage options:

python train_and_evaluate.py -h

For example, to train and save a model for Multinomial Naive Bayes, run:

python train_and_evaluate.py MultinomialNB -t

In order to train to find the best hyperparameters for given classifier, use find_hyperparameters.py. Run the following to get more usage options:

python find_hyperparameters.py -h

Doing this will also train the best model on the full training set and save it.

Finally, if data still needs to be pulled from Reddit, run:

python get_data.py

This will attempt to pull 1000 posts from each subreddit, save them to a file, and save splits of the full data for training, development, and testing.

Name		Name	Last commit message	Last commit date
Latest commit History 46 Commits
.gitignore		.gitignore
README.md		README.md
Subreddit Classifier Report.pdf		Subreddit Classifier Report.pdf
find_hyperparameters.py		find_hyperparameters.py
get_data.py		get_data.py
parameters.py		parameters.py
predict_subreddit.py		predict_subreddit.py
train_and_evaluate.py		train_and_evaluate.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Subreddit Classifier

Setup

Usage

About

Releases

Packages

Contributors 2

Languages

ianrandman/subreddit-classifier

Folders and files

Latest commit

History

Repository files navigation

Subreddit Classifier

Setup

Usage

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Contributors 2

Languages

Packages