GitHub - merrillm1/sentiment_analysis_with_ULMFiT: Project aimed at differentiating between positive and negative reviews using the fastai's ULMFiT implementation method.

Final Results

Please past the URL of the colab notebook into Nbviewer to see iPlot and Plotly graphs

The best results were achieved were a 91% accuracy using EDA on review text data for a more robust training set. EDA introduces new words to the vocabulary and can allow for the model to become more versatile through the augmented training data and thus be able to predict better on new data. While the final accuracy was a little over 90%, it did outperform the models trained on the raw and manually cleaned reviews consistently. I did have some scepticism that this may be the result of simply having more training data, so rest assured I did train the models on greater training sizes and did in fact see the same results.

For the raw text, it seemed to consistently outperform the clean text model. This may be because fastai's built in preprocessing allows for more information to be given to the model during training that eventually improves its ability to predict with new data. For the final eda model however, it was much more successful when predictions were made using preprocessed text. This may be because EDA not only transforms the reviews but processes the text as well, thus making it more familiar with clean data.

Next steps in this project will be to gain a better understanding of how EDA can help based on model outcome. For example, my model was having trouble with reviews that had a mixed sentiment. An example of this would be someone who writes about a positive expectation, only to have a negative experience. Is there a way to manipulate the EDA to build a training set more capable of training a model to recognize the true sentiment of these mixed reviews? I also need to gain a greater understanding of how the the learning rate, momentum and number of epochs together contribute to the model outcome to understand more impactful ways of manipulating them.

Overall I believe the model I created performed well. The misclassified reviews were ones you may expect a model to mistake. How can I train a model to understand that the statement, "I will never eat at any other chinese restaurant" it positive? There is still much work to be done.

Name		Name	Last commit message	Last commit date
Latest commit History 14 Commits
.ipynb_checkpoints		.ipynb_checkpoints
data		data
models		models
.DS_Store		.DS_Store
.gitattributes		.gitattributes
Fellowship-ai_challenge_solution.ipynb		Fellowship-ai_challenge_solution.ipynb
README.md		README.md
Yelp_Sentiment_Analysis_with_ULMFiT.ipynb		Yelp_Sentiment_Analysis_with_ULMFiT.ipynb

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

.ipynb_checkpoints

.ipynb_checkpoints

data

data

models

models

.DS_Store

.DS_Store

.gitattributes

.gitattributes

Fellowship-ai_challenge_solution.ipynb

Fellowship-ai_challenge_solution.ipynb

README.md

README.md

Yelp_Sentiment_Analysis_with_ULMFiT.ipynb

Yelp_Sentiment_Analysis_with_ULMFiT.ipynb

Repository files navigation

Final Results

About

Releases

Packages

Languages

merrillm1/sentiment_analysis_with_ULMFiT

Folders and files

Latest commit

History

Repository files navigation

Final Results

About

Topics

Resources

Stars

Watchers

Forks

Languages