Materials for the 2021 data science class at the University of Sao Paulo (week 2)
To clone: git clone [email protected]:ivezic/SaoPaulo2021.git
Lectures (ipython notebooks and one keynote file for Lecture 19) are available from subdirectory notebooks: https://github.com/ivezic/SaoPaulo2021/tree/main/notebooks
Monday
Lecture 11: "Introduction to Statistics and ML method" (basic statistics, random samples, robust statistics, the maximum likelihood method)
Lecture 12: "Introduction to Bayesian Statistics" (introduction to Bayesian statistics, priors, nuisance parameters, parameter estimation)
Afternoon Activity 6: (robust statistics and histogram comparison, binomial distribution, Bayesian Blocks)
Tuesday
Lecture 13: Introduction to Regression (beyond ordinary LSQ: measurement errors, errors in both variables, linear basis function regresion, non-linear regression with MCMC and pymc3)
Lecture 14: Time Series Analysis (Fourier analysis, period estimation, digital filtering, stochastic processes)
Afternoon Activity 7: (model comparison for polynomials, Bayesian blocks, model comparison with two bursts)
Wednesday
Lecture 15: Density Estimation (searching for Structure in 1-D Point Data, Gaussian Mixture Models, Extreme Deconvolution, Bonus: Density Estimation for SDSS "Great Wall")
Lecture 16: Dimensionality Reduction (PCA, NMF, ICA methods, and forward reference to Convolutional Neural Networks)
Afternoon Activity 8: (Hess diagrams produced with kernel density estimation and using Gaussian Mixture Model, using Gaussian Mixture Model, and BIC to study the impact of sample size and measurement errors on ability to recognize structure in data, Principal Component Analysis on LINEAR dataset with 4-dimensional visualization)
Thursday
Lecture 17: Clustering (Unsupervised Classification) (introduction to clustering, unsupervised vs. supervised classification, 1-D hypothesis testing, clustering with Gaussian Mixture models, K-means clustering algorithm, hierarchical clustering algorithm, discussion of Term Project)
Lecture 18: (Supervised) Classification (introduction to Supervised Classification, Support Vector Machine classifier, star/galaxy separation using Gaussian Naive Bayes classifier, a comparison of many methods using ROC curves)
Afternoon Activity 9: (clustering of orbital data for asteroids using Gaussian Mixture Model and the Minimum Spanning Tree model; supervised classification of periodic variable stars in multi-dimensional space of colors and lightcurve parameters)
Friday (postponed to Monday, August 23)
Lecture 19: Big Data Challenges from LSST Big Data from Rubin Observatory's Legacy Survey of Space and Time (powerpoint and keynote slides)
Lecture 20: Classification of Astronomical Images with Deep Learning (introduction to basic concepts and an example based on CNN ResNet50)
Afternoon Activity 10: review of lectures and discussion of term projects.