Skip to content

NWC-CUAHSI-Summer-Institute/data_science_principals

Repository files navigation

Bootcamp Data Science Principles

This repository contains training materials focused on fundamental principles of data science. The content is primarily designed for use in training workshops during a two-week bootcamp.

Python Libraries

The tutorials in this repository rely on various Python libraries that may not be pre-installed on your system. To ensure you have all the necessary libraries, run the following command in the main directory: pip install -r requirements.txt. This command will install all the dependencies listed in the requirements.txt file.

Lessons Overview

01_dealing_with_large_datasets

In this lesson, we focus on ensuring that your data processing steps for large-scale projects are reproducible. We'll cover strategies to avoid common pitfalls in large dataset handling and emphasize the importance of reproducibility in data science workflows.

02_data_exploration

This lesson introduces basic functions for exploring datasets. We'll delve into simple statistical analyses and visualize correlations to gain insights into our data. The goal is to equip you with tools for preliminary data analysis and understanding.

03_data_d_istributions

Here, we explore methods to examine the probability distributions of your data. Understanding data distributions is crucial for selecting appropriate statistical models and for data preprocessing.

04_regression

The final lesson covers various aspects of regression analysis. We start with Least Squares Regression, then move to more specific applications like the USGS regression equations for streamflow recurrence. We also explore Random Forest Regression for a CONUS-wide streamflow recurrence model, demonstrating a practical application of machine learning techniques in hydrological studies.


By following these lessons, you'll gain a solid foundation in key data science concepts and techniques, preparing you for more advanced topics and applications.

Contributing and Suggestions

We welcome contributions and ideas for new modules! If you have suggestions for improvements, additional content, or ideas for entirely new modules, please share them with us.

How to Contribute:

  1. Submit Ideas for New Modules or Improvements: If you have an idea for a new module or suggestions for improving existing content, please open an issue in this GitHub repository with a detailed description of your proposal.

  2. Contribute Directly: If you're interested in directly contributing to the development of new modules or enhancements, please fork this repository, make your changes, and submit a pull request with your contributions.

Your insights and contributions are valuable to us, and they play a significant role in continuously improving and expanding this repository for the benefit of all learners.


We look forward to your ideas and contributions, and together, we can make this resource even more beneficial for everyone interested in data science!

About

Training materials for data science principals

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published