Detect Depression In Twitter Posts

Overview

This is a final project for the course CMSC389A. Mental illness such as depression can be life threatening, with suicide as a possible outcome. In this project, a LSTM with Convolutional Neural Network is built using Keras to determine whether social platform users are depressive based on their Twitter posts. The accuracy of the model is evaluated and compared to a binary classification baseline model using logistic regression. It is discovered that the model has a 98.91% accuracy after 5 epochs, while the base line model has a much lower accuracy of 83.755%.

Retrieving Test Data

Warning to Researchers

The test data described below is extremely biased as it's scrapped with the keyword depression, for research purposes I recommend looking at the dataset Reddit Self-reported Depression Diagnosis (RSDD) and Self-reported Mental Health Diagnoses (SMHD) from Georgetown University here. This is also not an actively maintained project thus inquiries are discouraged.

There are two kinds of tweets that are needed for this project: random tweets that do not indicate depression and tweets that show the user may have depression. The random tweets dataset can be found from the Kaggle dataset twitter_sentiment. It is harder to get tweets that indicate depression as there is no public dataset of depressive tweets, so in this project tweets indicating depression are retrieved using the Twitter scraping tool TWINT using the keyword depression by scraping all tweets in an one day span. The scrapped tweets may contain tweets that do not indicate the user having depression, such as tweets linking to articles about depression. As a result, the scrapped tweets need to be manually checked for better testing results. A csv file of scrapped tweets is provided, however the following code can be used to obtain depressive tweets for this project, keep in mind that the date in the code should be changed and the generated .csv file should be manually checked and moved to the project directory:

python3 twint.py -s depression --since 2018-05-15 -o depressive_tweets_processed.csv --csv

Test Data Split

Collected tweets are split into training, testing, and validation sets with a ratio of 60%:20%:20%.

	Depressive Tweets	Normal Tweets
Training	1384	7146
Validation	462	2382
Testing	462	2383
Total	2308	11911

Required Libraries

ftfy - fixes Unicode that's broken in various ways
gensim - enables storing and querying word vectors
keras - a high-level neural networks API running on top of TensorFlow
matplotlib - a Python 2D plotting library which produces publication quality figures
nltk - Natural Language Toolkit
numpy - the fundamental package for scientific computing with Python
pandas - provides easy-to-use data structures and data analysis tools for Python
sklearn - a software machine learning library
tensorflow - an open source machine learning framework for everyone

In addition, the pretrained vectors for the Word2Vec model is from here.

How to Run

To run the DepressionDetectionInTwitter.ipynb iPython notebook that contains all the code, please run the following line in the project directory:

$ jupyter notebook

Video Demo

Here is a video demo of this project, enjoy!

License

MIT

Name		Name	Last commit message	Last commit date
Latest commit History 14 Commits
DepressionDetectionInTwitter.ipynb		DepressionDetectionInTwitter.ipynb
LICENSE		LICENSE
README.md		README.md
depressive_tweets_processed.csv		depressive_tweets_processed.csv

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Detect Depression In Twitter Posts

Overview

Retrieving Test Data

Warning to Researchers

Test Data Split

Required Libraries

How to Run

Video Demo

License

About

Releases

Packages

Languages

License

eman85/DetectDepressionInTwitterPosts

Folders and files

Latest commit

History

Repository files navigation

Detect Depression In Twitter Posts

Overview

Retrieving Test Data

Warning to Researchers

Test Data Split

Required Libraries

How to Run

Video Demo

License

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages