This project focuses on predicting rainfall in India using machine learning techniques. The dataset consists of historical rainfall data across various subdivisions in India, and the goal is to classify annual rainfall into categories: Low, Medium, and High. The project explores different ML models, including Random Forest, Support Vector Machine (SVM), and Logistic Regression.
- Source: Indian Meteorological Department (IMD) Dataset
- Features: Monthly rainfall data, seasonal rainfall, subdivision, year, etc.
- Target Variable: Rainfall category (Low, Medium, High)
- Random Forest Classifier ๐ฒ
- Support Vector Machine (SVM) ๐
- Logistic Regression ๐
- Data Preprocessing: Handling missing values, encoding categorical variables.
- Exploratory Data Analysis (EDA): Visualizing rainfall distribution.
- Model Training & Evaluation: Training and testing multiple ML models.
- Feature Importance Analysis: Understanding which factors influence rainfall the most.
- Model Comparison & Visualization: Evaluating and comparing model performance.
Model | Accuracy | Precision | Recall | F1 Score |
---|---|---|---|---|
Random Forest | 95.2% | 97% | 96% | 96% |
SVM | 96.02% | 97% | 97% | 97% |
Logistic Regression | 97.3% | 98% | 97% | 97% |
- Best Model:ย Logistic Regression
- Feature Importance: Seasonal rainfall and monsoon months contribute most to prediction accuracy.
To compare the models, we use:
- Bar Chart: Comparing accuracy of Random Forest, SVM, and Logistic Regression.
- Confusion Matrix: Analyzing misclassification patterns.
- ROC Curve: Evaluating classification performance.
A bar chart is used to compare the accuracy of each model.
Confusion matrices help us understand where each model makes classification mistakes.
The ROC curve is plotted for a multi-class setting using the One-vs-Rest (OvR) strategy to compare the probability-based performance of each model.
Feel free to fork this repository, improve the model, and submit a pull request! ๐