This repository contains code and resources for a machine learning project focused on predicting water quality based on various physiochemical properties. The goal is to develop a system for assessing the potability of water samples, crucial for public health and safety.
The dataset consists of physiochemical properties of water samples, labeled with potability status. It serves as a valuable resource for training and evaluating machine learning models.
Several machine learning models were trained and evaluated, including decision trees, K-nearest neighbors, logistic regression, random forests, XGBoost, Gaussian naive Bayes, support vector machines, and AdaBoost. Models were optimized using techniques like hyperparameter tuning and cross-validation.
The XGBoost classifier emerged as the best-performing model, achieving the highest accuracy in predicting water potability.
- Code: Google Colab files (.ipynb) for data preprocessing, analysis, model training, and evaluation.
- Datasets: Water quality dataset.
- Report: Detailed report summarizing the project.