Skip to content

My graduate level machine learning course, including student machine learning projects.

License

Notifications You must be signed in to change notification settings

GeostatsGuy/MachineLearningCourse

Repository files navigation

The Instructor:

Michael Pyrcz, Associate Professor, University of Texas at Austin

Novel Data Analytics, Geostatistics and Machine Learning Subsurface Solutions

With over 17 years of experience in subsurface consulting, research and development, Michael has returned to academia driven by his passion for teaching and enthusiasm for enhancing engineers' and geoscientists' impact in subsurface resource development.

For more about Michael check out these links:

Course TAs and Contributors:

Fall 2019 - Honggeun Jo, Graduate Student, The University of Texas at Austin
Fall 2020 - Jack Xiao, Graduate Student, The University of Texas at Austin
Fall 2021 - Mide Mabadeje, Graduate Student, The University of Texas at Austin
Fall 2022 - Misael Morales, Graduate Student, The University of Texas at Austin

PGE 383 / GEO 391: Subsurface Machine Learning Graduate Course

Graduate class at the University of Texas at Austin, from the syllabus:

“You will learn the theory and practice of data analytics and machine learning for subsurface resource modeling”.

Learning Outcomes: 

By the end of this course, you will be able to: 

  • perform prerequisite data analytics, data checking, and evaluation, to support machine learning models 
  • select features, engineering features, and project features to lower dimensional space to build the best possible models 
  • segment datasets with cluster analysis for improved models 
  • select from and train and tune a wide variety of predictive machine learning models 
  • build robust data analytics and machine learning workflows in Python with open source packages 
  • calculate useful diagnostics and critically evaluate and check your models 
  • present, communicate, document and deploy your modeling workflows  

Course Content Online (other than CANVAS)

I like to put content online for anyone to access. Note: My online handle is ‘GeostatsGuy’. These are the online resources for the course:

GitHub

My GitHub Repositories are here.

This includes the following repositories that may be helpful:

  • PythonNumericalDemos – worked out examples in Python, Jupyter Notebooks with Markdown for data analytics (bootstrap, declustering, principal components analysis, decision tree, and support vector machines, deep learning etc.

  • GeoDataSets) – synthetic, but realistic spatiotemporal, multivariate datasets to support my students and my educational content.

  • Geostatsr – workflows in R for linear regression, spatial continuity, kriging, simulation, principal components analysis, and decision tree.

  • GeostatLectures – short lectures and posters with concise descriptions of topics in geostatistics.

  • GeostatsPy – spatial data analytics Python package that I wrote to support this course and everyone will install.

  • ExcelNumericalDemos – worked out examples in Microsoft Excel of statistical concepts such as distributions, hypothesis tests, confidence intervals, heterogeneity measures, spatial continuity, kriging, simulation, bootstrap and decision making in the presence of uncertainty.

YouTube

To support my students and provide an evergreen resource that outlasts the semester, I record the lectures and post them on YouTube channel GeostatsGuy Lectures.

This includes the following playlists:

Twitter

Follow me on Twitter where I'm the GeostatsGuy!

  • I tweet daily about data analytics, geostatistics and machine learning ideas and resources, engineering, and infrequently unrelated to engineering or science (e.g. outdoors activities and local live music).

Textbook

READINGS: There is no course textbook. All lectures are posted on YouTube and all in class demonstrations are available as well-documented workflows on GitHub. The provided notes, slides in PDF and example workflows are comprehensive and cover all content on examinations, but students interested in additional reading are welcome to refer to:

Machine Learning:

Hastie, T, Tibshirani, R., and Friedman, J., 2012, The Elements of Statistical Learning; Data Mining, Inference and Prediction, Springer.

James et al., 2013, An Introduction to Statistical Learning: with Applications in R, Springer.

Subsurface Data Analytics and Modeling:

Pyrcz, M. and Deutsch, C., Geostatistical Reservoir Modeling, Oxford University Press, New York, 2014.

Also, various journal papers will be posted for reference.

Machine Learning Projects

As part of the course, all students complete a machine learning project.

The challenge: build a well-documented, educational machine learning workflow.

Here's the motivation and more details:

  • produce a comprehensive, concise, well-documented, machine learning workflow in a Jupyter Notebook. Opportunity to apply course learnings and demonstrate a high level of proficiency. With permission, I post the workflows online in this GitHub repository and use it to support future classes (with credit).

  • open-source contributions to GitHub are recognized in many companies.

  • to assist I have provided a project template.

  • the workflows are graded by the following criteria:

Element Description
Great Executive Summary Gap, Work to Address Gap, Learnings and Recommendation
Workflow Steps All Aligned with Goal
Concise Workflow Every Step and Figure has a Purpose / Consistent with Provided Template / Features Briefly Explained / Feature have Units
Images / Figures Excellent Figures / Subplots and Combined Plots for Efficient Displays and Communication / Axes Labeled / Consistent Figure Sizes
Demonstrated Knowledge All Modeling Choices Defended / Demonstrated Extension of Knowledge
Readable Code Code Documentation / Steps’ Description and Observations between Code Blocks / Only Include Needed Packages / Use Function for Concise Code
Citations All Code from Others Cited
Creativity / Innovation Unique, Novel Application of Machine Learning

I share these to promote the students' work.

  • We are teaching novel data analytics, geostatistics and machine learning skills to engineering and science students.

This is Going to Be Fun

I hope you join us in my PGE 383: Subsurface Modeling class. We have about 40 graduate students from engineering and geoscience participating here at the University of Texas at Austin. The Jackson School of Geosciences offered a new classroom in their building after we outgrew our room in the Petroleum and Geosystems engineering department. I appreciate the excellent support from both the Hidebrand Department of Petroleum and Geosystems Engineering and the Jackson School of Geosciences.

Want to Work Together?

I hope that this is helpful to those that want to learn more about subsurface modeling, data analytics and machine learning. Students and working professionals are welcome to participate.

  • Want to invite me to visit your company for training, mentoring, project review, workflow design and consulting, I'd be happy to drop by and work with you!

  • Interested in partnering, supporting my graduate student research or my Subsurface Data Analytics and Machine Learning consortium (co-PIs including Profs. Foster, Torres-Verdin and van Oort)? My research combines data analytics, stochastic modeling and machine learning theory with practice to develop novel methods and workflows to add value. We are solving challenging subsurface problems!

  • I can be reached at [email protected].

I'm always happy to discuss,

Michael

Michael Pyrcz, Ph.D., P.Eng. Associate Professor The Hildebrand Department of Petroleum and Geosystems Engineering, Bureau of Economic Geology, The Jackson School of Geosciences, The University of Texas at Austin

More Resources Available at: Twitter | GitHub | Website | GoogleScholar | Book | YouTube | LinkedIn

About

My graduate level machine learning course, including student machine learning projects.

Topics

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published