Marcel Scharth, The University of Sydney
This is a repository for the Jupyter Notebooks and code used in Statistical Learning Data Mining, postgraduate unit at the University of Sydney Business School. I additionally provide the lectures in case you need them for future reference.
This version: Semester 2, 2017.
Tutorial 1: Working with Data in Python
Tutorial 2: K-Nearest Neighbours Regression
Tutorial 3: Regression Modelling
Tutorial 4: Cross Validation
Tutorial 5: The Bootstrap
Tutorial 6: Linear Model Selection and Regularisation
Tutorial 7: Naive Bayes and Sentiment Analysis
Tutorial 8: Logistic Regression and Gaussian Discriminant Analysis
Tutorial 9: Regression Splines
Tutorial 10: Regression Trees
Tutorial 11: Model Stacking
Tutorial 12: Credit Risk Modelling
Module 1: Introduction to Statistical Learning
Module 2: Linear Regression and Statistical Thinking
Module 3: K-Nearest Neighbours Regression
Module 4: Regression Modelling
Module 5: Model Selection
Module 6: The Bootstrap
Module 7: Estimation Methods (reference module)
Module 8: Linear model Selection and Regularisation I
Module 9: Linear model Selection and Regularisation II
Module 10: Classification I
Module 11: Classification II
Module 12: Nonlinear Modelling
Module 13: Tree-based Methods
Module 14: Model Stacking
Module 15: Boosting
Acknowledgement: these lectures use figures from Introduction to Statistical Learning and Elements of Statistical Learning (see below).
Textbook:
An Introduction to Statistical Learning by Gareth James, Daniela Witten, Trevor Hastie, and Robert Tibshirani.
The lectures and tutorials also draw on material from:
The Elements of Statistical Learning by Trevor Hastie and Robert Tibshirani.
Statistical Methods in Customer Relationship Management by V. Kumar and J. Andrew Petersen.
Machine Learning: A Probabilistic Perspective by Kevin P. Murphy.
Mathematical Statistics with Resampling and R by Laura M. Chihara and Tim C. Hesterberg.
Students are highly encouraged to encourage to consider the following additional resources.
A Mind for Numbers: How to Excel at Math and Science by Barbara Oakley.
Dataquest (Python course online).
DataCamp (Python course online).
Learning Data Science (Kaggle Wiki)