Skip to content

david-cuervo/Tanzanian_water_wells_project

Repository files navigation

Module 3 Project

By David Cuervo

Background

Build a classifier to accurately predict the condition of water wells in Tanzania.

Tanzania_Water_Well_plot_map

Contents of Repository

  • Folder containing the original data sets from DrivenData
  • CSV of cleaned data
  • Data_Cleaning Notebook: contains the code for exploring and cleaning the original data set
  • Modeling Notebook: contains code for building the best classifier
  • PNG image of Tanzania and the wells plotted
  • PDF of project presentation
  • Rubric for Module 3 Project

Approach

  • Began by downloading the data from DrivenData
  • Worked through data set column by column to deal with missing data, outliers, and catigorical variables
  • Exported cleaned data and used it so begin building classifiers
  • Used Boruta as feature selection
  • Used the features selected through Boruta to build logistic regression, decision tree, and random forest models

Conclusions

  • Decision tree was the most accurate model, 75%

feature_importance_tanzania

  • Construction year, waterpoint type, and GPS height were the most important features in the model

gps_height_well_function waterpoint_type_status

  • Moving forward, prioritize older wells and uncommon types of wells

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published