Skip to content

Latest commit

 

History

History
85 lines (62 loc) · 2.26 KB

syllabus-2.md

File metadata and controls

85 lines (62 loc) · 2.26 KB

Day 2 (5 Nov): Working with Data - Weili Qiu

Morning 9:30 - 12:00

Afternoon 13:00 - 17:00

Objectives

  • Learn how to read and write files using Python
  • Learn how to view, filter, impute, modify, concatenate, reshape data using NumPy and Pandas
  • Learn how to create plots using Matplotlib
  • Understand what web requests and responses are, and learn how to interact with web APIs

Material

  • Jupyter Notebook
  • Datasets: UK population, Weekly COVID-19 cases, COVID-19 vaccination data
  • Internet connection

Timetable

9:30 – 9:45 Intro & Review

  • Intro
  • Looking at the data: UK population, Weekly COVID-19 cases, COVID-19 vaccination data
  • Review what we have learned on Day 1

9:45 – 10:15 File Handling

  • Open and close a file
  • Read and write a file
  • JSON serialisation
  • Hands-on

10:15 – 10:45 Data Manipulation (NumPy)

  • NumPy array
  • Create, visit, modify and copy array (and matrix)
  • Arithmetic
  • Statistics
  • Reshape
  • Condition-based masking and indexing
  • Hands-on

10:45 – 11:00 Break

11:00 – 11:45 Data Manipulation (Pandas)

  • Series and DataFrame
  • Sorting & statistics
  • Missing values
  • Create, visit, modify and copy Series and DataFrame
  • Hands-on

12: 00 – 13: 00 Lunch Break

13:00 – 14:00 Data Manipulation (Pandas continued, and Visualisation)

  • Filter data and make conditional changes
  • Saving the data
  • Visualisation
    • Scope (“canvas”) of matplotlib
    • Elements of a figure
    • Types of figures: line, pie, histogram/bar, box & Whisker
    • Resize and save fig
  • Hands-on

14:00 – 14:15 Break

14:15 - 15:30 Interact with Web API

Use: OpenGWAS (semantic), EBI (RESTful), and PubMed (homework / only if there is time left, due to its unreliability)

  • Request and Response (What happens during this process)
    • HTTP verb
    • Status code and contents of responses
  • Understanding API docs
  • Timeout and error handling
  • Hands-on

15:30 - 16:00 Miscellaneous (optional, depending on time)

NOTE: This part was not delivered because we spent more time in the morning reviewing what we learned in Day 1.

Performance and Best Practice

  • Searching for an element in Python list, dict and set
  • Every little helps but avoid negative optimisation
  • Lambda and df.apply(), df.assign()