This repo has all the resources you need to become an amazing data engineer!
Make sure to check out the projects section for more hands-on examples!
Make sure to check out the interviews section for more advice on how to pass data engineering interviews!
Great books:
- The Fundamentals of Data Engineering
- Designing Data-Intensive Applications
- Designing Machine Learning Systems
- The Hundred Page Machine Learning Book
- Kimball - The Data Warehouse Toolkit
- Data Mesh
- Machine Learning System Design Interview
- Streaming Systems
- High Performance Spark
- Building Evolutionary Architectures, 2nd Edition
- Data Management at Scale, 2nd Edition
- Deciphering Data Architectures
- 97 Things Every Data Engineer Should Know: Collective Wisdom from the Experts
- Data Governance: The Definitive Guide
- Delta Lake: The Definitive Guide
- Hadoop: The Definitive Guide
- Modern Data Engineering with Apache Spark: A Hands-On Guide for Building Mission-Critical Streaming Applications
Communities:
- Seattle Data Guy Discord
- EcZachly Data Engineering Discord
- Chip Huyen MLOps Discord
- Data Engineer Things Slack
- DBT Community
- r/dataengineering
- Microsoft Fabric Community
- r/MicrosoftFabric
- Data Talks Club Slack
- SylphAI for data professional matchmaking
Companies:
- Tabular
- Starburst
- Preset
- Astronomer
- Mage
- Dagster
- Prefect
- AlgoExpert
- ByteByteGo
- Databricks
- Spark
- dbt
- Cube
- Airbyte
- Microsoft
- Snowflake
- Onehouse
Data Engineering blogs of companies:
- Netflix
- Uber
- Databricks
- Airbnb
- Amazon AWS Blog
- Microsoft Data Architecture Blogs
- Microsoft Fabric Blog
- Oracle
- Meta
Data Engineering Whitepapers:
- A Five-Layered Business Intelligence Architecture
- Lakehouse:A New Generation of Open Platforms that Unify Data Warehousing and Advanced Analytics
- Big Data Quality: A Data Quality Profiling Model
- The Data Lakehouse: Data Warehousing and More
Great YouTube Channels:
- Data with Zach
- Seattle Data Guy
- TrendyTech
- E-learning Bridge
- Darshil Parmar
- Andreas Kretz
- ByteByteGo
- The Ravit Show
- Azure Lib
- Eric Roby
- Guy in a Cube
- Advancing Analytics
- Adam Marczak
- nullQueries
- Kahan Data Solutions
- Ankit Bansal
Great Podcasts
- The Data Engineering Show
- Data Engineering Podcast
- DataTopics
- The Data Engineering Side Of Data
- DataWare
- The Data Coffee Break Podcast
- Thd datastack show
- Intricity101 Data Sharks Podcast
- Drill to Detail with Mark Rittman
- Analytics Power Hour
- Catalog & cocktails
- Datatalks
- Data Brew by Databricks
- The Data Cloud Podcast by Snowflake
- What's New in data
- Open||Source||Data by Datastax
- Streaming Audio by confluent
- The Data Scientist Show
- MLOps.community
Newsletters:
- DataEngineer.io Newsletter
- Seattle Data Guy
- Joe Reis
- Data Engineering Weekly
- Data Engineering Central
- Dutch Engineer
- ByteByteGo
- Start Data Engineering
- Developing Dev
- High Growth Engineer
- Learn Analytics Engineering
- Marvelous MLOps
- medium Data Engineering Newsletter
- Benn Stancil
- Metadata Weekly
- Technically
- Blef.fr Data News
- All Hands on Data
- Modern Data 101
- Zach Wilson
- Ben Rogojan
- Joe Reis
- Sumit Mittal
- Shashank Mishra
- Darshil Parmar
- Joseph Machado
- Chip Huyen
- Alex Xu
- Deepak Goyal
- Eric Roby
- Andreas Kretz
- Tobias Macey
- Shruti Mantri
- Hugo Lu
- Daniel Ciocirlan
- Marc Lamberti
- Simon Whiteley
- Dipankar Mazumdar
Twitter / X
- Zach Wilson
- Seattle Data Guy
- Sumit Mittal
- Joseph Machado
- Alex Xu
- Eric Roby
- Andreas Kretz
- Marc Lamberti
- Dipankar Mazumdar
TikTok
Design Patterns
- Cumulative Table Design
- Microbatch Deduplication
- The Little Book of Pipelines
- Data Developer Platform
Courses / Academies
- DataEngineer.io Bootcamp/course use code HANDBOOK10 for a discount!
- LearnDataEngineering.com
- Technical Freelancer Academy Use code zwtech for a discount!
- IBM Data Engineering for Everyone
- Qwiklabs
- DataCamp
- Udemy Courses from Shruti Mantri
- Rock the JVM teaches Spark (in Scala), Flink and others
- Data Engineering Zoomcamp by DataTalksClub
Certifications Courses