Cyber Bullying Detection on Social Media Platform using Machine Learning

As individuals passionate about understanding language and social media trends, we initiated a project aimed at identifying cyberbullying using Bengali language data. Cyberbullying, a prevalent problem on social platforms, can be addressed through machine learning techniques. By automating the detection process, we strive to enhance online safety.

Overview

Cyberbullying detection involves the use of machine learning techniques to analyze text data and identify instances of abusive or harmful behavior. In this project, We focused on leveraging machine learning algorithms to classify text data from social media posts written in Bengali. By training models on annotated datasets, we aimed to develop accurate classifiers capable of detecting cyberbullying in real-time.

Technologies Used

NumPy: A fundamental package for numerical computing in Python, essential for handling arrays and mathematical operations.
Pandas: A powerful data manipulation library used for data preprocessing, analysis, and manipulation, offering data structures like DataFrames that simplify data handling.
scikit-learn (sklearn): A versatile machine learning library providing a wide range of algorithms for classification, regression, clustering, and more, along with tools for model selection and evaluation.
Matplotlib: A comprehensive plotting library for creating static, animated, and interactive visualizations in Python, essential for data visualization and result analysis.
Seaborn: Built on top of Matplotlib, Seaborn provides a high-level interface for drawing attractive and informative statistical graphics.
NLTK (Natural Language Toolkit): A leading platform for building Python programs to work with human language data, used for text preprocessing tasks such as tokenization, stemming, and stopwords removal.
Regular Expressions (re): A module in Python providing support for regular expressions, used for pattern-matching and text manipulation tasks.
TfidfVectorizer: Part of scikit-learn, TfidfVectorizer is used to convert text data into numerical features based on term frequency-inverse document frequency (TF-IDF) for machine learning models.
PorterStemmer: A stemming algorithm used for reducing words to their base or root form, helping to normalize text data.
Stopwords Corpus: A collection of common words like "the," "is," and "and" that are filtered out during text preprocessing as they typically don't carry significant meaning in natural language processing tasks.
GitHub: Used for version control and collaboration, providing a centralized repository for project files and code.
Jupyter Notebook: Employed for interactive development and experimentation with data and models.

Usage

Data Collection: Gather a sufficient amount of Bengali text data from social media platforms, ensuring it is annotated with labels indicating instances of cyberbullying.
Preprocessing: Preprocess the text data by removing punctuation, stopwords, digits, and applying stemming to reduce the dimensionality of the feature space.
Model Development: Train machine learning models such as logistic regression, decision tree, random forest, and XGBoost on the preprocessed text data to classify instances of cyberbullying.
Evaluation: Evaluate the performance of each model using cross-validation and metrics such as accuracy, precision, recall, and F1-score.

By following these steps and leveraging the mentioned technologies, we aimed to develop an effective cyberbullying detection system capable of enhancing online safety on social media platforms for Bengali-speaking users.

References

Below are the references that we consulted to study and gain insights into implementing our project: [1] Cyberbullying. (n.d.). Link

[2] Cyberbullying – Law and Legal Definitions US Legal. Link

[3] An Educator’s Guide to Cyberbullying Brown Senate.gov. Link

[4] E. J. K. S. T. P. R. A. Eri Eli lavindi, ”Cyber-bullying Detection based on Machine Learning Method(CaseStudy: Instagram Comment Section),”Journal of Applied Information and Communication Technologies, vol.8, pp. 223-226, 2023. Link

[5] N. B. a. M. H. H. Rahman, ”Toxicity Detection on Bengali Social Media Comments using Supervised Models,” in 2019 2nd International Conference on Innovation in Engineering and Technology (ICIET), Dhaka, Bangladesh, 23-24 December 2019. Link

[6] M. R. M. H. N. a. S. M. M. H. C. S. Ahammed, ”Implementation of Machine Learning to Detect Hate Speech in Bangla Language,” 8th International Conference System Modeling and Advancement in Research Trends (SMART), India,doi:10.1109/SMART46866.2019.9117214, pp. 317- Link

[7] K. A. U. a. M. M. A. P. A. Akhter, ”Cyber Bullying Detection and Classification using Multinomial Naive Bayes and Fuzzy Logic,” Int. J. Math. Sci. Comput, vol. 5, no. doi: 10.5815/ijmsc.2019.04.01, p. 1–12, 2019. Link

[8] S. B. C. a. A. H. R. R. Dalvi, ”Detecting A Twitter Cyberbullying Using Machine Learning,” 2020 4th International Conference on Intelligent Computing and Control Systems (ICICCS), Madurai, India no. doi: 10.1109/ICICCS48265.2020.pp. 297-301, 2020. Link

Contributions

Contributions are welcome! If you have any suggestions, bug reports, or feature requests, feel free to open an issue or submit a pull request on GitHub.

The Faces Behind the Project:

Team name: High Five

Memebers

Tanzila Akhter (343)

Nurun Nahar Fiha (361)

Md. Parvej Hoque Palash (378)

Sakib Mollah (387)

Serajum Monira (2142)

Name		Name	Last commit message	Last commit date
Latest commit History 13 Commits
Code		Code
Dataset		Dataset
Cyber Bullying Detection on Social Media Platforms Using Machine Learning - Phase-2-Prepared by Group-5.pdf		Cyber Bullying Detection on Social Media Platforms Using Machine Learning - Phase-2-Prepared by Group-5.pdf
README.md		README.md
Updated Project (Highlighting Differences).pdf		Updated Project (Highlighting Differences).pdf

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

Cyber Bullying Detection on Social Media Platform using Machine Learning

Overview

Technologies Used

Usage

References

Contributions

The Faces Behind the Project:

About

Uh oh!

Releases

Packages

Contributors 4

Uh oh!

Languages

SM2142/Cyber-Bullying-Detection-on-Scocial-Media-Plaform-Using-Machine-Learning

Folders and files

Latest commit

History

Repository files navigation

Cyber Bullying Detection on Social Media Platform using Machine Learning

Overview

Technologies Used

Usage

References

Contributions

The Faces Behind the Project:

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Contributors 4

Uh oh!

Languages

Packages