-
Notifications
You must be signed in to change notification settings - Fork 0
/
readme.txt
36 lines (31 loc) · 2.83 KB
/
readme.txt
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
The Global Goal #4: QUALITY EDUCATION
A toolkit for researchers and decisionmakers around the globe
Quality Education a Catalyst for a Better World
- There is probably no greater way to improve the living conditions and well-being of people in any country than quality education.
- Finland is a good global example of a poor developing nation which through quality education rose to its current global position
- That’s why it’s crucially important to provide tools for researchers and policy-makers to help with making the most impactful decisions to promote Quality Education
For Whom?
For researchers with only basic Python skills
- use the prepared data and ready-made models to conduct your research
For Data Scientists
- use our work as a starting point, apply our models for new data and develop the models further
The Package
- Full human readable listing of all SDG4 indicators available from the UN SDG API
- Preprocessed CIA factbook background data from countries across the globe
- A model to use a country’s CIA data to predict its missing values in the UN SDG data.
- The user can freely choose both the explanatory variables from the CIA data and the SDG data to use as the target of either Linear Regression or Random Forest Regression.
- Tools for exploratory data analysis are provided, including Principal Component Analysis and k-Means Clustering with user defined number of dimensions and clusters
- Further options to narrow down the list of countries to analyze
- We have used cross-validation to get an idea how accurately the predictive model will perform in practice, typically the Random Forest Regressor outperforms Linear Regression.
- Scatterplots of the real and predicted SDG values as y with each of the chosen explanatory variable as x for the chosen variables
- Clear visualizations make it easy to detect outlier data points. These outliers can be excluded when building the regression model by user's discretion.
- The package will be made available for everyone to use and to contribute on GitHub
Known Issues
- In the UN SDG data there is often a significant number of countries with no data available for the different indicators
- There are some countries that due to political reasons are not statistically treated in the same way in the CIA factbook and the UN SDG data – most importantly Taiwan and Palestine
- We did not have resources to fine tune the design of the visualizations at this point as they are made mainly for research purposes
Acknowledgements
The project was developed by three Helsinki University Computer Science Master's degree program students for a data science miniproject.
The team: Matti Kiviluoto, Juhani Kivimäki, Sebastian Lampinen from Helsinki, Finland
When using our work, please mention the contributors.
Use [email protected] for any questions you may have.