You're seeing this file here because it's the README for the base directory of this repo. We'll use it to keep track of useful links and information.
These are meant to keep you on the right track. Keep these in mind as you work on your project. At a minimum, you should convince yourself that you have clear and succinct answers to these questions.
- what problem am I trying to solve?
- where will the data come from?
- what exploratory/feature engineering/modeling steps will be useful?
- how will I evaluate my model's performance?
event | subject | date |
---|---|---|
hw 2 due | Classification & Cross Validation | 11/11 |
guest speaker | Melinda Han Williams, Lead Data Scientist at Dstillery | 11/16 |
project deadline 1 | Elevator Pitch | 11/18 |
project deadline 2 | First Draft of Final Project - Data cleanup + EDA | 11/30 |
guest speaker | Patrick McNamara, Data Scientist at Oscar Insurance | 12/02 |
project deadline 3 | Final Project | 12/16 |
[lec 1]
(https://github.com/jason137/gads_26/tree/master/lec01) - intro & setup
[lec 2]
(https://github.com/jason137/gads_26/tree/master/lec02) - data exploration & pre-processing with Unix
[lec 3]
(https://github.com/jason137/gads_26/tree/master/lec03) - data manipulation with pandas
[lec 4]
(https://github.com/jason137/gads_26/tree/master/lec04) - exploratory data analysis
[lec 5]
(https://github.com/jason137/gads_26/tree/master/lec05) - data transformations
[lec 6]
(https://github.com/jason137/gads_26/tree/master/lec06) - concepts of machine learning
[lec 7]
(https://github.com/jason137/gads_26/tree/master/lec07) - logistic regression & regularization
[lec 8]
(https://github.com/jason137/gads_26/tree/master/lec08) - naive bayes classification
[lec 9]
(https://github.com/jason137/gads_26/tree/master/lec09) - decision tree classification
[lec 10]
(https://github.com/jason137/gads_26/tree/master/lec10) - ensemble classifiers
[lec 11]
(https://github.com/jason137/gads_26/tree/master/lec11) - project examples
[lec 12]
(https://github.com/jason137/gads_26/tree/master/lec12) - k-means clustering
[lec 13]
(https://github.com/jason137/gads_26/tree/master/lec13) - evaluating model performance
Email your solution to [email protected] and [email protected] with subject:
[gads-26][hw2] student-name
There are a handful of (small) datasets in this repo that you can use to practice the techniques we discuss in class. Don't hesitate to seek out & use other datasets that you find interesting! We can even post them here to share.
https://github.com/jason137/gads_26/blob/master/anaconda.md
https://github.com/jason137/gads_26/blob/master/general_references.md
- sql?
- recsys (1/2)?
- map-reduce (1/2)?
- pca?