Country_data_clustering

The objective of this project was the categorisation of world countries using socio-economic and health factors that indicate the overall development of the country.

The dataset was taken from here https://www.kaggle.com/datasets/rohan0301/unsupervised-learning-on-country-data.

Data from 167 countries were given in csv format Country_data_clustering with the following features:

country Name of the country;

child_mot Death of children under 5 years old per 1000 live births;

exports Exports of goods and services per capita. Given as %age of the GDP per capita;

health Total health spending per capita. Given as %age of GDP per capita;

imports Imports of goods and services per capita. Given as %age of the GDP per capita;

Income Net income per person;

Inflation: The measurement of the annual growth rate of the Total GDP;

life_expec: The average number of years a new born child would live if the current mortality pattern remains the same;

total_fer: The number of children that would be born to each woman if the current age-fertility rate remains the same;

gdpp: The GDP per capita. Calculated as the Total GDP divided by the total population.

In the notebook called Country_data_clustering_kmeans.ipynb, I applied k-means algorithm, whilst in this one Country_data_clustering_DBSCAN_Birch.ipynb I applied DBSCAN and Birch.

Firstly, I imported the libraries and read the dataset. Then, I explored the datasets looking at the main statistical parameters and calculating the correlation matrix for all the numerical features.

()

I plotted the countries in the World and in Europe with their respective value for each feature. The interactive plots can be found at the following links:

Afterwards, I plotted a violin plot to represent the frequency of the values for each feature. I scaled the data and I applied the K-means algorithm, plotting the inertia and the silhouette score for each chosen number of cluster:

According to the plot of the inertia, the optimal number of cluster is 4 since the curve has an "elbow" at 4 cluster. The silhouette score indicates a high value at 4 clusters, too. In this case, instead, I decided to choose 3 clusters since the algorithm isolates better the countries that need more help.

Next, I plotted an interactive plot able to visualize the clusters (represented with 3 different colors) in a better way. Below, it is possible to check out both the static and interactive plots (click on the link below the figure).

Each feature can be bounded to some particular values, clicking on the bar associated with each feature and unclicking when the user is satisfied with the range of values.

Features vs Labels Kmeans: Interactive Plot

Click here to check the interactive plot --> Features vs Labels Kmeans: Interactive Plot

Below, instead, I plotted the different clusters on the globe. Each cluster can be associated with countries that have similar development conditions.

Kmeans: Needed Help Per Country

Click here to check the interactive plot --> Kmeans: Needed Help Per Country

At the end, a correlation plot was plotted enhancing the 3 different clusters and showing how they were separated in the feature hyperspace.

Kmeans clustering scatterplots

Click here to download the plot --> Kmeans: scatterplots

DBDSCAN and Birch were also applied (take a look to the following notebook Country_data_clustering_DBSCAN_Birch.ipynb), showing the following results:

DBSCAN: Needed Help Per Country

Birch: clustering scatterplots

Note: The interactive plots and the other graphs used for Kmeans with the other algorithms, can be found in the notebooks.

It can be observed that DBSCAN found a consistent number of outliers, even though different hyperparameters were tested.

Using Birch, the result is similar to Kmeans, apart from few countries that were not considered in the same Kmeans classes.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Country_data_clustering

Features vs Labels Kmeans: Interactive Plot

Click here to check the interactive plot --> Features vs Labels Kmeans: Interactive Plot

Kmeans: Needed Help Per Country

Click here to check the interactive plot --> Kmeans: Needed Help Per Country

Kmeans clustering scatterplots

Click here to download the plot --> Kmeans: scatterplots

DBSCAN: Needed Help Per Country

Birch: clustering scatterplots

About

Releases

Packages

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 77 Commits
interactive_plots		interactive_plots
png_images		png_images
Birchclusteringscatterplots.png		Birchclusteringscatterplots.png
Country-data.csv		Country-data.csv
Country_data_clustering_DBSCAN_Birch.ipynb		Country_data_clustering_DBSCAN_Birch.ipynb
Country_data_clustering_kmeans.ipynb		Country_data_clustering_kmeans.ipynb
DBscan clustering scatterplots.png		DBscan clustering scatterplots.png
Featurevslabel_dbscan.png		Featurevslabel_dbscan.png
Interactive-plots_Europe.html		Interactive-plots_Europe.html
Interactive-plots_World.html		Interactive-plots_World.html
Kmeans clustering scatterplots.png		Kmeans clustering scatterplots.png
Needed Help Per Country (World).html		Needed Help Per Country (World).html
Needed Help Per Country (World).png		Needed Help Per Country (World).png
NeededHelpPerCountry(World)_birch.html		NeededHelpPerCountry(World)_birch.html
NeededHelpPerCountry(World)_birch.png		NeededHelpPerCountry(World)_birch.png
NeededHelpPerCountry(World)kmeans.html		NeededHelpPerCountry(World)kmeans.html
NeededHelpPerCountry(World)kmeans.png		NeededHelpPerCountry(World)kmeans.png
NeededHelpPerCountrykmeans.html		NeededHelpPerCountrykmeans.html
README.md		README.md
correlation.png		correlation.png
features_and_labels_plot_interactive.html		features_and_labels_plot_interactive.html
features_and_labels_plot_interactive.png		features_and_labels_plot_interactive.png
features_and_labels_plot_interactive_birch.html		features_and_labels_plot_interactive_birch.html
features_and_labels_plot_interactive_birch.png		features_and_labels_plot_interactive_birch.png
features_and_labels_plot_interactive_dbscan.html		features_and_labels_plot_interactive_dbscan.html
features_and_labels_plot_interactive_dbscan.png		features_and_labels_plot_interactive_dbscan.png
requirements.txt		requirements.txt

Iron486/Country_data_clustering

Folders and files

Latest commit

History

Repository files navigation

Country_data_clustering

Features vs Labels Kmeans: Interactive Plot

Click here to check the interactive plot --> Features vs Labels Kmeans: Interactive Plot

Kmeans: Needed Help Per Country

Click here to check the interactive plot --> Kmeans: Needed Help Per Country

Kmeans clustering scatterplots

Click here to download the plot --> Kmeans: scatterplots

DBSCAN: Needed Help Per Country

Birch: clustering scatterplots

About

Topics

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages