Skip to content

Latest commit

 

History

History
115 lines (67 loc) · 8.84 KB

README-English.md

File metadata and controls

115 lines (67 loc) · 8.84 KB

Hands-on data analysis

Motivation

Hands-on data analysis is Datawhale's open source project on the direction of data analysis. This project began in Datawhale's previous data analysis course, when I was a student who read the book - python for data analysis as the teaching material. The book for pandas and numpy operation is very clear and detailed, but for the logic of data analysis, there is much less content. So many learners and I found after learning, do not know what they have to do, when we meet data analysis problems. The idea of "I don't know how to use it" is actually very understandable, after learning the more theoretical things, there will be a small gap between the practical application in life and what we learned from the theory. How to bridge this gap may require your own experimentation and study of real-world materials.

So if there is a course, it is a project-based line, the knowledge points bred in it, through the side of learning, while doing and being guided to make learning better. After learning the course, we can master pandas and can master the general experience of data analysis process. Through research, it seems that there are no projects on the market about data analysis that can fully meet the above criteria. So Datawhale's partners joined together to make an open source course to accomplish the small goals mentioned above, so that all the learners who have used our course can better start their data analysis journey.

Now the course has been updated to version 1.3, we have improved the learning process, as well as providing better answers to explain. Later on, we will gradually launch the supporting materials. We still want to start from the basic data analysis operation and data analysis process, and introduce real-world examples in each module. After that, we will continue to add new content (such as data mining algorithms and so on). This is an open source project, we will keep iterating, and we will all participate and work together.

About the name of our project - hands-on data analysis . Data analysis is a process to see the truth from a bunch of numbers.Learning to manipulate data is only part of the skill of data analysis, the other half is the experience inside the brain. So we need to think more and summarize more in the learning process, and more hands-on, realistic code. So I also hope that when you learn this course, you will reason more and ask more why; practice more and make sure that the theory and practice are combined. At the end of the course, you will definitely have a big harvest.

Matching materials

Since this is a course born out of Datawhale, it is better to learn it with other resources that Datawhale provides. The code we provide is in the form of a jupyter, which contains the tasks you have to complete, as well as the hints and guidance we give you, so this format combined with Datawhale's group learning, you can discuss with everyone and add information together, then the learning effect will definitely be doubled. Also, Datawhale previously open-sourced a pandas tutorial - Joyful-Pandas. It composes the logic of Pandas as well as the code demonstration, so in our data analysis course, about the operation of Pandas, you can refer to Joyful-Pandas, which can make your data analysis learning more rewarding.

Project scheduling

Schedul

The course is now divided into three units, which can be roughly divided into: Basic Data Operations, Data Cleaning and Reconstruction, and Modeling and Evaluation:

  1. Part I: We get a data to be analyzed, I have to learn how to load the data, view the data, then learn some basic operations of Pandas, and finally start to try exploratory data analysis.
  2. Part 2: After we can be more proficient in manipulating the data and recognizing the data, we need to start data cleaning and reconstruction to turn the original data into a usable data, in preparation for putting it into the model later.
  3. Part 3: We have to consider what model to build depending on the task requirements, and we use the popular sklearn library to build the model. For a model to be good or bad, we are required to evaluate it, after that we evaluate our model and do optimization of the model.
Chapter Summary
Chapter 1 Data loading and preliminary observations
Pandas basics explained
Exploratory Data Analysis
Chapter 2 Data cleaning and feature processing
Data Reconstruction 1
Data Reconstruction 2
Data Visualization
Chapter 3 Data Modeling
Model Evaluation

How to learn

Our codes are in jupyter form, and each part of the course is divided into two parts Course and Answers. During the learning period, in the course code, finish all the learning, find the information by yourself, finish the code operation inside by yourself, think about the part and the insights. After that, you can discuss with your buddies and share the information and insights. About the answer part, you can refer to, because the data analysis itself is open, so the answer is also open, more hope that you can have their own understanding and answers. If you need a reference, we provide the answers we wrote in the Answers section, so you can refer to them.

(课程部分-需要自己根据要求敲代码)

Feedback

Feedback from learners of previous versions

As a learner with no foundation, I am very comfortable learning data analysis in this period, the tutorials are also relatively simple and clear, and the overall learning is very smooth. Each task I will read the tutorial twice. The first time only watch the tutorial and then chew the book using Python for data analysis. The assignments were great in terms of expansion, which I really liked. Then the second time I read the tutorial was to finish the homework and reflection without reading the answers at all. Basically, it is still a great sense of accomplishment after learning, and really have learned a lot. This course as an introduction to data analysis course, really great!

--------Danfei Wu, North China Electric Power University

First of all this learning document is very well done and very guided. I like the way of learning in the project - active learning and searching if you don't understand.

-------- Li Qingqing

Helped a lot. After I finished the program, I will still use the skills from the course in my real job. I hope that a later version of the course will include a section on data analysis logic.

--------Version V1.0 Group Study Participants

Excellent student Liu Chuchu Excellent assignmenthttps://space.bilibili.com/621981283/channel/detail?cid=191222

(Welcome to watch the video that explan the all assignments)

Improvement methods

If you don't find what you want in Hands-on Data Analysis, or if you find an error in your project, please don't hesitate to go to our GitHub Issues for feedback, we will reply to you within 24 hours, and you can contact me by email if you don't reply after 24 hours ( [email protected]).

Contributors

Project leader

Andong Chen: Datawhale Member, Hu Nan University|Queen Mary University of London

Core contributors

Juanjuan Jin: Datawhale member, Master of Zhejiang University

Yang Jada: Datawhale member, data mining engineer

Lao Cousin: Datawhale member, author of the Jane said Python

Contributor

Hongxing: Datawhale member, data analyst

Li Ling: Datawhale member, algorithm engineer

Gao Liye: Datawhale member, graduate student of Taiyuan University of Technology

Zhang Wentao: Datawhale member, PhD student at Sun Yat-sen University

Follow us

Scan the QR code below and reply with the keyword "动手学数据分析" to join the "Project Exchange Group".

Datawhale is an AI-focused open source organization with the vision of

LICENSE

知识共享许可协议

Copyright License: CC-BY-NC-ND license