Skip to content

materials for GA data science class (fall 2015)

Notifications You must be signed in to change notification settings

matheuristic/gads_26

 
 

Repository files navigation

welcome to data science at GA!

You're seeing this file here because it's the README for the base directory of this repo. We'll use it to keep track of useful links and information.

project questions

These are meant to keep you on the right track. Keep these in mind as you work on your project. At a minimum, you should convince yourself that you have clear and succinct answers to these questions.

  • what problem am I trying to solve?
  • where will the data come from?
  • what exploratory/feature engineering/modeling steps will be useful?
  • how will I evaluate my model's performance?

important dates

event subject date
hw 2 due Classification & Cross Validation 11/11
guest speaker Melinda Han Williams, Lead Data Scientist at Dstillery 11/16
project deadline 1 Elevator Pitch 11/18
project deadline 2 First Draft of Final Project - Data cleanup + EDA 11/30
guest speaker Patrick McNamara, Data Scientist at Oscar Insurance 12/02
project deadline 3 Final Project 12/16

syllabus

[lec 1] (https://github.com/jason137/gads_26/tree/master/lec01) - intro & setup
[lec 2] (https://github.com/jason137/gads_26/tree/master/lec02) - data exploration & pre-processing with Unix
[lec 3] (https://github.com/jason137/gads_26/tree/master/lec03) - data manipulation with pandas
[lec 4] (https://github.com/jason137/gads_26/tree/master/lec04) - exploratory data analysis
[lec 5] (https://github.com/jason137/gads_26/tree/master/lec05) - data transformations
[lec 6] (https://github.com/jason137/gads_26/tree/master/lec06) - concepts of machine learning
[lec 7] (https://github.com/jason137/gads_26/tree/master/lec07) - logistic regression & regularization
[lec 8] (https://github.com/jason137/gads_26/tree/master/lec08) - naive bayes classification
[lec 9] (https://github.com/jason137/gads_26/tree/master/lec09) - decision tree classification
[lec 10] (https://github.com/jason137/gads_26/tree/master/lec10) - ensemble classifiers
[lec 11] (https://github.com/jason137/gads_26/tree/master/lec11) - project examples
[lec 12] (https://github.com/jason137/gads_26/tree/master/lec12) - k-means clustering
[lec 13] (https://github.com/jason137/gads_26/tree/master/lec13) - evaluating model performance

submission format

Email your solution to [email protected] and [email protected] with subject:

[gads-26][hw2] student-name

datasets

There are a handful of (small) datasets in this repo that you can use to practice the techniques we discuss in class. Don't hesitate to seek out & use other datasets that you find interesting! We can even post them here to share.

anaconda installation

https://github.com/jason137/gads_26/blob/master/anaconda.md

general data science references

https://github.com/jason137/gads_26/blob/master/general_references.md

todo

  • sql?
  • recsys (1/2)?
  • map-reduce (1/2)?
  • pca?

About

materials for GA data science class (fall 2015)

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages

  • Jupyter Notebook 96.3%
  • Python 3.7%