Little Birdy

Introduction

Little Birdy is a web app where in the user can put in any person, object, movie, product name and the web app will provide them with all the detailed analysis about that specific thing based on the real-time tweets on Twitter.
The analysis/overview includes, tweets, its sentiment, hashtags, word cloud for the respective sentiment, and a graph for the ratio of sentiments

Demo

You can try out our web app by clicking Little Birdy

You can also checkout this quick demo

2022-09-09.22-55-00.mp4

Dataset and Pre-Processing

Sentiment140 dataset with 1.6 million tweets was used for developing the model which was customly preprocessed to a very highh standards.

Click here to get the dataset.

Pre-processing steps

~5k null values were removed
We tried to remove even the fine grained spelling error, unexpected characters and more...
Performed Lemmatization to convert the word into its root form
For the stopwards part, we tried to remove majority of 1-2 letter alphabets. Then checked for top 500 high frequency words in the whole dataset and manually removed the words which was not needed or were biased
Tokenized and padded the dataset
Converted word into vectors using FastText word embedding(implemented Glove and BERT as well)
Splitted into Train set and Test set in 95:5 ratio

Model Building

We tried working with three different word embeddings, namely, FastText, GloVe, and BERT
BERT didn't work out due to limited resources
Accuracy with the other two embeddings were similar so we to carry out experiment by judging on the output of the two model given a specific sentance
We tried two models, LSTM and LSTM-CNN hybrid model out of which LSTM showed comparitively better results
Achieved an accuracy of 83.98% on Train set and 83.51% on Test set
Then we saved the model as well as the tokenizer

Deployment

Created a flask app and front-end using react and hosted locally
Then we deployed the web app on Microsoft Azure using VM

Data Visualization

Accuracy graph

Loss graph

Confusion matrix

F1 score, Precision, and Recall