Unveiling the 🐦 Twittersphere (Now 𝕏): Community Detection Analysis

This project aims to unite similar Twitter communities by identifying shared interests through Unsupervised Learning Techniques on Graph and Tabular Data.

This project is a segment of my Unsupervised Learning and Social Network Analysis (UL & SNA) course, under the guidance of Professor M. Lazaar at ENSIAS, Mohammed V University.

In choosing a project for this course, I opted to concentrate on clustering communities within Twitter. Coming from a traditional machine learning background involving tabular data, I was particularly intrigued by the challenge of handling graph data and constructing machine learning models that could uncover patterns without human guidance. While Facebook and Google+ were available data sources, Twitter stood out due to its simplicity and engaging nature.

The entirety of this project comprises sample code demonstrating the following procedures:

Identification of Twitter communities using the Stanford Network Analysis Project (SNAP) Twitter graph data, employing two distinct methods: Edge-based and Feature-based approaches.
Generation of a visual representation and preprocessing of data by creating a graph and computing the adjacency matrix through networkx, scipy, and matplotlib.
Edge-based approach:
- Execution of training for the Spectral Clustering model over the adjacency matrix followed by its evaluation using Silhouette score via Scikit-Learn.
Feature-based approach:
- Construction of a tabular format from the graph data, enhancing it with critical graph centrality metrics, including degree, closeness, and betweenness centrality.
- Execution of training for various clustering algorithms—KMEANS, SpectralClustering, and AgglomerativeClustering—followed by their evaluation using Silhouette scores.
Assignment of labels to clusters (produced by the best performing approach) by identifying the most commonly used hashtags among cluster members. These hashtags are then employed to encapsulate key themes, such as 'Social Media Cluster,' 'Gaming Cluster,' and 'Music Cluster,' portraying the prevalent interests within each cluster.

Visual Project Walkthrough

Dataset Statistics

Files Hierarchy

Graph Vizualisation

Feature Extraction using Edge Based Approach

Feature Extraction using Feature Based Approach

Final Dataframe using Feature Based Approach

Experimental Results

Hashtag Distribution Across Clusters Generated by the Optimal Method (e.g., Feature-Based Approach with KMEANS):

Music Community

Social Media Community

Gaming Community

For a more comprehensive explanation, please consult the project report, review the code, and refer to the presentation.

Name		Name	Last commit message	Last commit date
Latest commit History 7 Commits
twitter		twitter
Code.ipynb		Code.ipynb
Code.pdf		Code.pdf
Project Defense Presentation.pdf		Project Defense Presentation.pdf
Project Report.pdf		Project Report.pdf
README.md		README.md
twitter_combined.txt		twitter_combined.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

twitter

twitter

Code.ipynb

Code.ipynb

Code.pdf

Code.pdf

Project Defense Presentation.pdf

Project Defense Presentation.pdf

Project Report.pdf

Project Report.pdf

README.md

README.md

twitter_combined.txt

twitter_combined.txt

Repository files navigation

Unveiling the 🐦 Twittersphere (Now 𝕏): Community Detection Analysis

This project aims to unite similar Twitter communities by identifying shared interests through Unsupervised Learning Techniques on Graph and Tabular Data.

Visual Project Walkthrough

Dataset Statistics

Files Hierarchy

Graph Vizualisation

Feature Extraction using Edge Based Approach

Feature Extraction using Feature Based Approach

Final Dataframe using Feature Based Approach

Experimental Results

Hashtag Distribution Across Clusters Generated by the Optimal Method (e.g., Feature-Based Approach with KMEANS):

Music Community

Social Media Community

Gaming Community

For a more comprehensive explanation, please consult the project report, review the code, and refer to the presentation.

About

Releases

Packages

Languages

Heyyassinesedjari/Twitter-Community-Detection

Folders and files

Latest commit

History

Repository files navigation

Unveiling the 🐦 Twittersphere (Now 𝕏): Community Detection Analysis

This project aims to unite similar Twitter communities by identifying shared interests through Unsupervised Learning Techniques on Graph and Tabular Data.

Visual Project Walkthrough

Dataset Statistics

Files Hierarchy

Graph Vizualisation

Feature Extraction using Edge Based Approach

Feature Extraction using Feature Based Approach

Final Dataframe using Feature Based Approach

Experimental Results

Hashtag Distribution Across Clusters Generated by the Optimal Method (e.g., Feature-Based Approach with KMEANS):

Music Community

Social Media Community

Gaming Community

For a more comprehensive explanation, please consult the project report, review the code, and refer to the presentation.

About

Topics

Resources

Stars

Watchers

Forks

Languages