-
Notifications
You must be signed in to change notification settings - Fork 50
/
Copy pathindex.qmd
180 lines (114 loc) · 7.51 KB
/
index.qmd
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
---
title: "Transport Data Science"
number-sections: false
---
This module focuses on applying data science techniques to solve real-world transport problems.
Based at the University of Leeds' Institute for Transport Studies (module code [TRAN5340M](https://webprod3.leeds.ac.uk/catalogue/dynmodules.asp?Y=202425&M=TRAN-5340M)), the course is led by Robin Lovelace, Professor of Transport Data Science and developer of several data-driven solutions for effective transport planning.
The course has evolved over a decade of teaching and research in the field. It aims to equip you with up-to-date and future-proof skills through practical examples and reproducible workflows using industry-standard data science tools.
# Prerequisites
Before taking this course, we expect you to have some experience with programming, data science, and general computing.
While experience with geographic data is helpful, it is not required.
## Hardware
We highly recommend having access to a computer with at least 8 GB of RAM that you have permission to install software on.
Alternatively, you could use cloud-based services such as RStudio Cloud, Google Colab, or GitHub Codespaces. However, you would need to be comfortable using these services and may miss out on some benefits of using your own computer.
## Computing experience
You should be comfortable with general computing tasks, such as:
- Creating folders and managing files
- Installing software
- Using command line interfaces (PowerShell in Windows, Terminal in macOS, or Linux shell)
## Data science experience prerequisites
Prior experience using R or Python is essential. This could include:
- Using these languages in professional work
- Experience from previous degrees
- Completion of relevant online courses
Students can demonstrate this prerequisite knowledge by showing evidence they have:
- Worked with R previously
- Completed online courses such as the first 4 sessions in the [RStudio Primers series](https://rstudio.cloud/learn/primers)
- Completed [DataCamp's Free Introduction to R course](https://www.datacamp.com/courses/free-introduction-to-r)
Substantial programming and data science experience in previous professional or academic work using languages like R or Python also satisfies the prerequisite requirements.
# Software requirements and installation
To participate in this course, you'll need to install specific software.
The teaching is delivered primarily in R, with some Python code and examples. You should install R (recommended), Python, or both on your computer.
We recommend using R for practical sessions and coursework unless you have a specific reason to use Python. If you choose Python or another language, please note:
- You will receive less direct support
- You'll need skills to set up and manage your own environments
We welcome translations of R code examples into other languages. Contributions to course materials via [Pull Requests on GitHub](https://github.com/itsleeds/tds/pulls) are encouraged.
## Quickstart with GitHub Codespaces
For a quick cloud-based setup, you can use GitHub Codespaces to access the course materials:
1. Sign up to GitHub
2. Fork the repository
3. Click the "Open in GitHub Codespaces" button above
Alternatively, use the following link:
[](https://github.com/codespaces/new/itsleeds/tds?quickstart=1)
## R
Install a recent version of R (4.3.0 or above) and an IDE:
- R from [cran.r-project.org](https://cran.r-project.org/)
- RStudio from [rstudio.com](https://rstudio.com/products/rstudio/download/#download) (recommended)
- Alternatively, VS Code with the R extension installed (if you have prior experience with it)
You'll also need to install R packages:
- Individual packages can be installed by opening RStudio and typing commands like `install.packages("stats19")` in the R console
- To install all dependencies for the module at once, run the following command in the R console:
```{r}
#| eval: false
if (!requireNamespace("remotes", quietly = TRUE)) {
install.packages("remotes")
}
remotes::install_github("itsleeds/tds")
```
See [Section 1.5 of the online guide Reproducible Road Safety Research with R](https://itsleeds.github.io/rrsrr/introduction.html#installing-r-and-rstudio) for instructions on how to install key packages we will use in the module.^[
For further guidance on setting-up your computer to run R and RStudio for spatial data, see these links, we recommend
Chapter 2 of Geocomputation with R (the Prerequisites section contains links for installing spatial software on Mac, Linux and Windows): https://geocompr.robinlovelace.net/spatial-class.html and Chapter 2 of the online book *Efficient R Programming*, particularly sections 2.3 and 2.5, for details on R installation and [set-up](https://csgillespie.github.io/efficientR/set-up.html) and the
[project management section](https://csgillespie.github.io/efficientR/set-up.html#project-management).
]
## Python
If you choose to use Python, you should be comfortable with:
- Installing Python
- Managing your own Python environment
- Installing packages and resolving package conflicts
For Python users, we recommend using an environment manager such as:
- `pixi` (which can manage both R and Python environments)
- Docker (best practice for reproducibility and isolation)
## Docker (advanced)
We maintain a Docker image containing all necessary software to complete the course with VS Code, Quarto, and a Devcontainer setup.
**Advantages:**
- Ensures reproducibility
- Saves time installing software
**Disadvantages:**
- Docker can be challenging to install
- Difficult to use if you're unfamiliar with Docker
We recommend this approach only for people who are confident with Docker or willing to invest time learning it.
For guidance, see:
- [Docker installation instructions](https://docs.docker.com/get-docker/)
- [Devcontainers documentation on github.com](https://github.com/devcontainers)
- The tds [Dockerfile](https://github.com/itsleeds/tds/blob/main/Dockerfile) and [devcontainer.json](https://github.com/itsleeds/tds/blob/main/.devcontainer/devcontainer.json)
# R, Python or other?
While the module focuses on methods implementable in many languages, we expect most participants will use R for practicals and the final course project.
We recommend R because:
- It provides a data science environment *with batteries included*
- It offers many mature packages for data manipulation, visualization, and statistical analysis
- These packages are available within seconds without worrying about conflicts or environment management
- The module team has the most experience with R
Python is another excellent choice for transport data science. Many of our R code examples have been ported to Python, as illustrated below. This example shows how to load the R package `{sf}` and its Python equivalent, `{geopandas}`:
::: {.panel-tabset}
## R
```r
library(sf)
geo_data = read_sf("geo_data.gpkg")
```
## Python
```python
import geopandas as gpd
geo_data = gpd.read_file("geo_data.gpkg")
```
:::
If you choose Python, you will need to:
- Manage your own Python environment
- Translate R code examples into Python
For the adventurous, you could try using:
- Julia
- JavaScript/TypeScript (e.g., via Observable)
- Other programming languages
However, please note that support for these alternative languages will be limited.
# Contributing to the course
See the [README](https://github.com/itsleeds/tds/tree/main?tab=readme-ov-file#quickstart) for instructions on how to contribute to the course materials.