🧠 Stroke Risk Prediction

Stroke is the second leading cause of death globally. This repository contains the code and documentation for a educational research project on stroke risk prediction. The project investigates the escalating incidence of strokes, leveraging a Kaggle-sourced dataset with 11 clinical features.

Materials & Methods

The dataset undergoes data preprocessing, exploratory data analysis, and the development of logistic regression and XGBoost models.

Exploratory Data Analysis (EDA) The EDA process involves visualization tools such as R and ggplot2 to analyze stroke events across demographic and lifestyle factors. Unexpected findings, such as the relationship between glucose levels and strokes, are explored.
Imbalanced Data Handling Acknowledging imbalanced data, the project addresses this bias through oversampling using the MWMOTE technique.
Logistic Regression and Variable Importance Logistic regression models are developed, and Variable Importance in Projection (VIP) analysis is employed to identify key factors influencing stroke occurrence.
AIC Analysis The project compares two logistic regression models using the Akaike Information Criterion (AIC) for model selection.
Performance Assessment Performance evaluation includes confusion matrices, ROC curves, and other metrics for both logistic regression and XGBoost models.

🚩 Faced challenges

Data Imbalance

To address class imbalance in the dataset was used the Majority Weighted Minority Oversampling Technique (MWMOTE). This technique oversamples the minority class, focusing on instances with fewer neighbors. It assigns weights to majority class instances based on their proximity to the minority class, guiding the generation of synthetic samples to balance the class distribution. This makes the model more capable of learning challenging instances.

📈 Results

The Logistic Regression Model was created by using the ‘glm’ method and ‘binomial’ family.
The other model is XGBoost, it is a model from the tree-based family.

Metric	Logistic Regression Model	XGBoost Model
Accuracy	81.13%	96.15%
Sensitivity (True Positive Rate)	85.12%	94.97%
Specificity (True Negative Rate)	77.15%	97.33%
Kappa Statistic	0.6226	0.923
Positive Predictive Value (PPV)	78.83%	97.27%
Negative Predictive Value (NPV)	83.83%	95.09%
Balanced Accuracy	81.13%	96.15%

📂 Project Structure

src/: Contains the script for data preprocessing, model development, and evaluation.
data/: Contains the Kaggle-sourced dataset.
plots/: Includes visualizations generated during the exploratory data analysis.
report/: Contains the presentation and the scientific paper of the project.

Note

Requirements

RStudio
R version 4.3.2
Run this command in RStudio console to install the required libraries:

source("./src/install_libraries.R")

Setup

Clone the repository:

git clone https://github.com/Kaito999/stroke-risk-prediction.git

Navigate to the project directory:
```
cd stroke-risk-prediction
```

Usage

Open the project: stroke-risk-prediction/stroke-risk-prediction.Rproj
Access the code: src/stroke_risk_prediction.R
Execute step by step the code inside the script

📝 License

This project is licensed under the [MIT] - see the LICENSE file for details.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

🧠 Stroke Risk Prediction

Materials & Methods

🚩 Faced challenges

📈 Results

📂 Project Structure

Requirements

Setup

Usage

📝 License

About

Releases

Packages

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 47 Commits
data		data
plots		plots
report		report
src		src
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
stroke-risk-prediction.Rproj		stroke-risk-prediction.Rproj

License

Kaito999/stroke-risk-prediction

Folders and files

Latest commit

History

Repository files navigation

🧠 Stroke Risk Prediction

Materials & Methods

🚩 Faced challenges

📈 Results

📂 Project Structure

Requirements

Setup

Usage

📝 License

About

Topics

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages