A comprehensive data analysis and machine learning project focused on analyzing student performance using the UCI student dataset containing 33 features across 396 instances.
The analysis includes key student attributes:
- Academic Performance (Grades)
- Attendance (Absences)
- Health Metrics
- Lifestyle Factors:
- Daily/Weekly Alcohol Consumption
- Free Time Management
- Internet Usage
- Academic Factors:
- Study Time
- Travel Time to School
- NumPy: Numerical computations and array operations
- Pandas: Data manipulation and analysis
- Seaborn: Statistical data visualization
- Matplotlib: Creating static, animated, and interactive visualizations
- Scikit-learn: Machine learning implementations
- Pickle: Model serialization
- K-means Clustering: Student grouping analysis
- Principal Component Analysis (PCA): Dimensionality reduction
- Decision Tree: Classification and prediction
- Random Forest: Ensemble learning for improved accuracy
- Linear Regression: Score prediction
- Data Loading and Preprocessing
- Exploratory Data Analysis
- Feature Engineering
- Model Training and Evaluation
- Performance Prediction
- Analyze factors affecting student performance
- Predict student scores based on various features
- Identify key patterns in student behavior and academic performance
- Generate actionable insights for educational improvement