This project focuses on extracting detailed and actionable insights from social media discussions about three major telecom operators: MTN. The analysis is based on data from two primary social media platforms: Twitter and Facebook.
We employ both traditional machine learning methods and state-of-the-art deep learning techniques, including BERT, to automatically identify and extract key descriptors from user opinions. These descriptors are then used to generate structured summaries of the sentiments expressed, which can be utilized by telecom companies to identify customer pain points and measure performance against competitors. Similarly, customers can use these summaries to make informed choices about their telecom providers.
For supervised learning tasks, we developed a custom human-annotated dataset, referred to as TelecomSent, containing 5,423 social media posts. Each post references one or more telecom providers, offering a rich dataset for sentiment analysis.
The core components extracted from these posts include the target telecom, the specific service aspect mentioned, and the sentiment expressed towards that aspect. This methodology falls under Targeted Aspect-Based Sentiment Analysis (TABSA).
- Python 3.6+
- TensorFlow
- Access to a GPU (or use Google Colab)
- Scikit-learn
- BERT-Base (Google's pre-trained models)
- NLTK (Natural Language Toolkit)
- NumPy 1.15.4
- PyTorch 1.0.0
The table below summarizes the results achieved using various machine learning and deep learning approaches. We evaluated the models using strict accuracy, Macro-F1 score, and AUC, with results provided for both aspect category detection and sentiment classification.
Model | Aspect Accuracy | Aspect F1 | Aspect AUC | Sentiment Accuracy | Sentiment AUC |
---|---|---|---|---|---|
RF-TFIDF | 0.540 | 0.392 | 0.615 | 0.958 | 0.737 |
RF-word2vec | 0.391 | 0.115 | 0.538 | 0.956 | 0.533 |
LR-TFIDF | 0.390 | 0.414 | 0.532 | 0.877 | 0.508 |
LR-word2vec | 0.365 | 0.229 | 0.482 | 0.918 | 0.487 |
LSTM | 0.705 | 0.231 | - | 0.705 | - |
BERT | 0.748 | 0.791 | 0.963 | 0.937 | 0.961 |
You can run the models by accessing the respective Jupyter notebooks provided in the Scripts directory.
- Random Forest with TFIDF: Run Notebook
- Random Forest with Word2Vec: Run Notebook
- Logistic Regression with TFIDF: Run Notebook
- Logistic Regression with Word2Vec: Run Notebook
- BERT Implementation: Run Notebook
- LSTM Model: Explore Code