Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Supporting reading material #66

Open
ysgurjar opened this issue Dec 2, 2021 · 5 comments
Open

Supporting reading material #66

ysgurjar opened this issue Dec 2, 2021 · 5 comments

Comments

@ysgurjar
Copy link

ysgurjar commented Dec 2, 2021

I am a complete beginner who decided to follow the roadmap couple of months ago. Sharing a few books that helped me to get started.

  1. How computer works : Code: The Hidden Language of Computer Hardware and Software Link
  2. How internet works: Introduction to Networking Link
  3. API : An Introduction to APIs by Brian Cooksey, Stephanie Briones (Illustrator), Danny Schreiber Link

I am a self learner who is looking forward to receiving further support on next steps.

@jamiros
Copy link

jamiros commented Dec 2, 2021

That's awesome! Thank you for sharing that!

@joseluistello
Copy link

Documenting APIs: A guide for technical writers and engineers This is an excelent material too

@ysgurjar
Copy link
Author

Thank you. @alexandraabbas and other folks, I am struggling to find a good resource for data structure and algorithms, Linux, serialisation. Additionally, I am not sure how much time I should be spending on each of these? There aren't any courses on data stack at this level. Suggestions?

@datatalking
Copy link

@ysgurjar it really depends upon where your skills are in terms of the interval of total skills as data engineering covers a wide swath of technology and experience level. Most of my work involves more scientific processing of data so I use linear algebra and matrix equations almost weekly. I'm looking at a book on my shelf and have eight books that I bought but really only use probably three or four.
0. I've been using 'Data Engineering for python' book and found it helps me. What language do you use @ysgurjar ?

  1. 'Data Structures and Algorithms Made Easy' by Narasimha which is 400+ pages and written in C so I have a friend I bribe to translate enough to python so I can grok it.
  2. 'Methods of Multivariate Analysis' or also known as 'Rencher' book is a deep dive into almost all of the algebra used in everything from NLP, ML and DL. So the Rencher book that many seem to love but its an advanced read.
  3. 'Intro to Algorithms' I had good luck with a friend and I who did together with me and she helped translate concepts from the 1,300 pages seems to solving problems so its a deep resource for me.
  4. If you are going to do the algebra it computes the stats and I had luck with 'Pearson Stats' and 'Introduction to Statistical Methods and Data Analytics' 7th edition, by Ott and Longnecker
  5. The Duke University open sourced i think all of their classes similar to MIT did so there is a wealth of data. Part of my MATH342 the professor recommended 'Introduction to Modern Statistics' by Mine Çetinkaya-Rundel and Johanna Hardin

@sarahgetter
Copy link

sarahgetter commented Nov 23, 2022

@ysgurjar Thanks for your list! I heartily endorse these O°Reilly books:

  1. 'Fundamentals of Data Engineering' by Joe Reis and Matt Housely
  2. 'Practical Statistics for Data Scientists' by Peter Bruce, Andrew Bruce and Peter Gedeck
  3. 'Data Science from Scratch' by Joel Grus
  4. 'Creating a Data-Driven Organization' by Carl Anderson
  5. 'Beautiful Visualization' by Julie Steele and Noah Iliinsky

I have thoroughly enjoyed 'Introduction to Design and Analysis of Experiments' by George W. Cobb, but would say this falls more into the realm of data science than data engineering.

'Beautiful Visualization' might feel outside of the data engineering umbrella, too, but helped me understand the use cases for different levels of time granularity, as it relates to how to best represent patterns and trends. This helped me decide when my materialization layers should offer up millisecond-level granularity, or when there is no need for per-event data, and the smallest period rollup can be a day. This book was also quite helpful for stepping into an "is this the most usable version for my tableau-utilizing analysts" perspective and stepping outside of my optimization-obsessed engineering perspective.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

5 participants