Skip to content

Statistical Learning and Data Mining (QBUS6810) at the University of Sydney Business School.

Notifications You must be signed in to change notification settings

xinyu42/statistical-learning

Repository files navigation

Statistical Learning and Data Mining (QBUS6810)

Marcel Scharth, The University of Sydney

This is a repository for the Jupyter Notebooks and code used in Statistical Learning Data Mining, postgraduate unit at the University of Sydney Business School. I additionally provide the lectures in case you need them for future reference.

This version: Semester 2, 2017.

Tutorials in Python

Tutorial 1: Working with Data in Python
Tutorial 2: K-Nearest Neighbours Regression
Tutorial 3: Regression Modelling
Tutorial 4: Cross Validation
Tutorial 5: The Bootstrap
Tutorial 6: Linear Model Selection and Regularisation
Tutorial 7: Naive Bayes and Sentiment Analysis
Tutorial 8: Logistic Regression and Gaussian Discriminant Analysis
Tutorial 9: Regression Splines
Tutorial 10: Regression Trees
Tutorial 11: Model Stacking
Tutorial 12: Credit Risk Modelling

Lectures

Module 1: Introduction to Statistical Learning
Module 2: Linear Regression and Statistical Thinking
Module 3: K-Nearest Neighbours Regression
Module 4: Regression Modelling
Module 5: Model Selection
Module 6: The Bootstrap
Module 7: Estimation Methods (reference module)
Module 8: Linear model Selection and Regularisation I
Module 9: Linear model Selection and Regularisation II
Module 10: Classification I
Module 11: Classification II
Module 12: Nonlinear Modelling
Module 13: Tree-based Methods
Module 14: Model Stacking
Module 15: Boosting

Acknowledgement: these lectures use figures from Introduction to Statistical Learning and Elements of Statistical Learning (see below).

References

Textbook:

An Introduction to Statistical Learning by Gareth James, Daniela Witten, Trevor Hastie, and Robert Tibshirani.

The lectures and tutorials also draw on material from:

The Elements of Statistical Learning by Trevor Hastie and Robert Tibshirani.

Statistical Methods in Customer Relationship Management by V. Kumar and J. Andrew Petersen.

Machine Learning: A Probabilistic Perspective by Kevin P. Murphy.

Mathematical Statistics with Resampling and R by Laura M. Chihara and Tim C. Hesterberg.

Other resources

Students are highly encouraged to encourage to consider the following additional resources.

A Mind for Numbers: How to Excel at Math and Science by Barbara Oakley.

Dataquest (Python course online).

DataCamp (Python course online).

Learning Data Science (Kaggle Wiki)

Kaggle Kernels

About

Statistical Learning and Data Mining (QBUS6810) at the University of Sydney Business School.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published