Customer Churn Prediction: A Comparative Analysis of Models with and without Sentiment Analysis

Overview of the Study

The core purpose of this study is to find the impact of Sentiment Analysis in predicting customer churn for the e-commerce industry by employing different predictive models. Furthermore, the study is also focused on observing which model is best in a more accurate prediction for determining the churn rate of customers.

Process Involved in the Study

The whole study is divided into two phases:

In the first phase, all the relevant variables that are expected to be causing the customer churn are selected and then the predictive models are developed. In this, there will be no feedback from the customer is utilized.
In the second phase, in addition to all the relevant variables obtained from EDA, the feedback provided by the customers is also included in this phase to extract the sentiment scores which are now added to the data frame. Again, the churn predictive models are developed with this data.

Finally, the metrics from both these phases will be reviewed and interpreted to understand if the inclusion of the sentiment analysis will be helpful for the organization in better understanding why their customers are parting away without making any future transactions with the organization.

The effectiveness of these models, with and without sentiment analysis, is compared to understand if sentiment analysis aids in better predicting why customers may discontinue transactions with the organization.

Research Question

"How does sentiment analysis impact predicting the customer churn of an organization?"

Flowchart of process involved in the Study

Checking the Classes Proportion (Class Imbalance)

1 - Represented the churned customer
0 - Represents the non-churned customer

Sentiment Analysis Integration

DistilBERT is used for performing the sentiment analysis. DistilBERT is a smaller, faster, cheaper and lighter version of BERT that still retains a lot of BERT's language understanding capabilities.

Hugging Face API Connection:

python

from huggingface_hub import notebook_login
notebook_login()

IMDB Dataset Pre-training (fine-tuning)

from datasets import load_dataset
imdb = load_dataset("imdb")

Model Performance Metrics

Cross-Validation Metrics We evaluate our models using the following metrics:
1. Accuracy
2. Recall
3. Precision
4. F1 Score
5. ROC_AUC

Random Forest Model Metrics:

import pandas as pd
## Dataframe setup
rf_output = pd.DataFrame({
'Training': [train_cv_acc_rf, train_cv_recall_rf, train_cv_precision_rf, train_cv_f1_rf, roc_auc_train_rf],
'Testing': [test_cv_acc_rf, test_cv_recall_rf, test_cv_precision_rf, test_cv_f1_rf, roc_auc_test_rf]
}, index=['Accuracy', 'Recall', 'Precision', 'F1', 'ROC_AUC'])
print(rf_output)

Feature Importance Analysis

We analyze the importance of different features in our models to understand which factors contribute most to customer churn.

Variable Importances for Random Forest and LightGBM:

import matplotlib.pyplot as plt
plt.barh(sorted_feature_names, sorted_feature_importances)
plt.xlabel('Feature Importance')
plt.title('Variable Importance for Random Forest')
plt.show()

Installation

To set up the project environment:

Clone the repository:

git clone https://github.com/GaneshKotaSLU/Customer-Churn-Prediction.git

Navigate to the Project Directory:
```
cd Customer-Churn-Prediction
```

Results

Our analysis shows that incorporating sentiment analysis into churn prediction models can significantly improve their accuracy. Key findings include:

Models with sentiment analysis outperformed traditional models atmost 3%.
Customer feedback sentiment was found to be a strong predictor of churn.
The SVM and Random Forest model showed the best overall performance.

Challenges and Limitations

Data quality and completeness varied across different customer segments.
The sentiment analysis model may not capture all nuances in customer feedback.
The current approach doesn't account for time-series aspects of customer behavior.

Contributing

Welcome contributions to this project. Please follow these steps:

Fork the repository

Create a new branch (git checkout -b feature/AmazingFeature)
Commit your changes (git commit -m 'Add some AmazingFeature')
Push to the branch (git push origin feature/AmazingFeature)
Open a Pull Request

License

This project is licensed under the MIT License - see the LICENSE.md file for details.

Citation

If you use this work in your research, please cite:

Kota, G. (2023). Customer Churn Prediction: A Comparative Analysis of Models with and without Sentiment Analysis. GitHub repository, https://github.com/GaneshKotaSLU/Customer-Churn-Prediction

Technologies Used

The below are few of the technologies used in this project.

Python 3.8+
Pandas
Scikit-learn
TensorFlow
Hugging Face Transformers
Matplotlib
LightGBM

Acknowledgments

Thanks to the Hugging Face team for their excellent NLP tools and models. This project was inspired by recent advancements in sentiment analysis and its applications in business intelligence.

Next Steps

This prototype needs to be integrated with real-time data to predict customer churn behavior. Future work includes:

Implementing real-time data integration

Exploring additional machine learning models, such as neural networks.
Conducting A/B testing in a production environment.
Investigating the impact of external factors (e.g., market trends, competitor actions) on churn rates.

Support

Support our work by starring our GitHub repository. For any questions or suggestions, please open an issue in the repository.

This comprehensive README provides a detailed overview of your project, its methodology, results, and future directions. It includes all the sections we discussed earlier, with placeholders for specific results and findings that you can fill in with your actual data. The structure is designed to be informative for both technical and non-technical readers, making your project more accessible and encouraging collaboration.

Name		Name	Last commit message	Last commit date
Latest commit History 5 Commits
LICENSE		LICENSE
README.md		README.md
churn.csv		churn.csv
ganesh_kota_Customer_Churn_Prediction_Phase1.html		ganesh_kota_Customer_Churn_Prediction_Phase1.html
ganesh_kota_Customer_Churn_Prediction_Phase1.ipynb		ganesh_kota_Customer_Churn_Prediction_Phase1.ipynb
ganesh_kota_Customer_Churn_Prediction_Phase2.html		ganesh_kota_Customer_Churn_Prediction_Phase2.html
ganesh_kota_Customer_Churn_Prediction_Phase2.ipynb		ganesh_kota_Customer_Churn_Prediction_Phase2.ipynb

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Customer Churn Prediction: A Comparative Analysis of Models with and without Sentiment Analysis

Table of Contents

Overview of the Study

Process Involved in the Study

Research Question

Flowchart of process involved in the Study

Checking the Classes Proportion (Class Imbalance)

Sentiment Analysis Integration

Hugging Face API Connection:

IMDB Dataset Pre-training (fine-tuning)

Model Performance Metrics

Feature Importance Analysis

Installation

Results

Challenges and Limitations

Contributing

Fork the repository

License

Citation

Technologies Used

Acknowledgments

Next Steps

Implementing real-time data integration

Support

About

Releases

Packages

Languages

License

GaneshKotaSLU/Customer-Churn-Prediction

Folders and files

Latest commit

History

Repository files navigation

Customer Churn Prediction: A Comparative Analysis of Models with and without Sentiment Analysis

Table of Contents

Overview of the Study

Process Involved in the Study

Research Question

Flowchart of process involved in the Study

Checking the Classes Proportion (Class Imbalance)

Sentiment Analysis Integration

Hugging Face API Connection:

IMDB Dataset Pre-training (fine-tuning)

Model Performance Metrics

Feature Importance Analysis

Installation

Results

Challenges and Limitations

Contributing

Fork the repository

License

Citation

Technologies Used

Acknowledgments

Next Steps

Implementing real-time data integration

Support

About

Topics

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages