This project aims to conduct an exploratory data analysis (EDA) on healthcare data using Python. Exploratory data analysis is an essential step in understanding the characteristics of a dataset before applying any machine learning models or drawing conclusions. Through visualizations and statistical summaries, we aim to gain insights into the healthcare data provided.
The notebook contains Python code for loading the dataset, performing data cleaning, manipulation, and exploratory data analysis using libraries such as NumPy, pandas, etc. Insights gained from the analysis include:
The ratio of NoShow and Show appointments is almost equal for most age groups, except for Age 0 and Age 1, where there is an 80% show rate. Patients without Scholarship have around an 80% show rate, while patients with Scholarship have a slightly lower show rate of around 75%. Patients without Hypertension have a 78% show rate, while patients with Hypertension have a higher show rate of around 85%. Patients without Diabetes have an 80% show rate, while patients with Diabetes have a slightly higher show rate of around 83%. Patients who did not receive an SMS reminder have an 84% show rate, whereas patients who received an SMS reminder have a lower show rate of around 72%. There are no appointments on Sundays, and appointments on Saturdays are significantly fewer compared to other weekdays.