-
Notifications
You must be signed in to change notification settings - Fork 0
/
Copy pathspatial_vector.qmd
164 lines (110 loc) · 5.3 KB
/
spatial_vector.qmd
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
---
title: "Vector Data"
engine: knitr
---
## Library Loading
Before we can dive into "actual" spatial work we'll need to load some libraries.
:::panel-tabset
## [{{< fa brands r-project >}} R]{.r}
Load `sf` to work with vector data.
```{r r-libs, warning = F, message = F}
# Load needed libraries
library(sf)
```
## [{{< fa brands python >}} Python]{.py}
Load the `geopandas` library to work with vector data. We'll also load the `os` library to deal with file path issues.
```{python py-libs}
# Load needed libraries
import os
import geopandas as gpd
```
:::
## Loading Vector Data
To demonstrate vector data operations we'll use data on the counties in North Carolina (USA). Note that some minor preparatory work was necessary to get the data ready for our purposes here and is preserved in [this folder](https://github.com/njlyon0/collab_bilingualism/tree/main/dev) of the website's {{< fa brands github >}} GitHub repository.
:::panel-tabset
## [{{< fa brands r-project >}} R]{.r}
There are several [{{< fa brands r-project >}} R]{.r} packages for working with vector data but we'll focus on `sf`.
```{r r-vect-read}
# Load a shapefile of county borders
vect_r <- sf::st_read(dsn = file.path("data", "nc.shp"))
# Check the class of this object
class(vect_r)
```
## [{{< fa brands python >}} Python]{.py}
The `geopandas` library will be our focal library for vector operations with [{{< fa brands python >}} Python]{.py}.
```{python py-vect-read}
# Load a shapefile of county borders
vect_py = gpd.read_file(os.path.join("data", "nc.shp"))
# Check the class of this object
type(vect_py)
```
:::
## Vector Data Structure
Note that vector data are often structurally similar to tabular data so some of the operations you can use in [Pandas]{.py}/[Tidyverse]{.r} [variables]{.py}/[objects]{.r} can be performed on these data. To demonstrate this, lets check the columns included in this vector data.
:::panel-tabset
## [{{< fa brands r-project >}} R]{.r}
In [{{< fa brands r-project >}} R]{.r} we can explore vector data using the `str` function. Note that the spatial information is stored in the "geometry" column; see the strange [class]{.r} attribute?
```{r r-vect-structure}
# Check columns
str(vect_r)
```
## [{{< fa brands python >}} Python]{.py}
In [{{< fa brands python >}} Python]{.py} we can explore vector data via the `dtypes` attribute (which is also found in 'normal' tabular data variables). Note that the spatial information is stored in the "geometry" column; see the strange [type]{.py} attribute?
```{python py-vect-structure}
# Check columns
vect_py.dtypes
```
:::
## Checking Vector CRS
As noted earlier, the very first thing we should do after reading in spatial data is _check the coordinate reference system!_
:::panel-tabset
## [{{< fa brands r-project >}} R]{.r}
We can check the CRS with the `st_crs` function (from `sf`).
```{r r-crs-check}
# Check the CRS of the shapefile
sf::st_crs(x = vect_r)
```
## [{{< fa brands python >}} Python]{.py}
[{{< fa brands python >}} Python]{.py} stores vector CRS as an attribute.
```{python py-crs-check}
# Check the CRS of the shapefile
vect_py.crs
```
:::
## Transforming Vector CRS
Once we know the current CRS, we can transform into a different coordinate reference system as needed. Let's transform from WGS84 into another commonly-used CRS: Albers Equal Area (EPSG code 3083).
Note that while transforming a raster's CRS is very computationally intensive, transforming vector data CRS is _much_ faster. If you are trying to use vector and raster data together but they don't use the same CRS it can be quicker to transform the vector data to match the raster data (rather than vice versa).
:::panel-tabset
## [{{< fa brands r-project >}} R]{.r}
For CRS transformations in [{{< fa brands r-project >}} R]{.r} we can use the `st_transform` function.
```{r r-crs-change}
# Transform the shapefile CRS
vect_alber_r <- sf::st_transform(x = vect_r, crs = 3083)
# Re-check the CRS to confirm it worked
sf::st_crs(vect_alber_r)
```
## [{{< fa brands python >}} Python]{.py}
[{{< fa brands python >}} Python]{.py} CRS re-projections use the same process but the relevant method is `to_crs`.
```{python py-crs-change}
# Transform the shapefile CRS
vect_alber_py = vect_py.to_crs("EPSG:3083")
# Re-check the CRS to confirm it worked
vect_alber_py.crs
```
:::
## Map Making with Vector Data
Once you've checked the CRS and confirmed it is appropriate, you may want to make a simple map. Again, because these data [variables]{.py}/[objects]{.r} have a similar structure to 'normal' tabular data we can use many of the same tools without modification.
:::panel-tabset
## [{{< fa brands r-project >}} R]{.r}
The base [{{< fa brands r-project >}} R]{.r} function `plot` works for spatial data too! Though note that it will make a separate map panel for each column in the data so we'll need to pick a particular column to get just one map.
```{r r-vect-map}
# Make a map
plot(vect_r["AREA"])
```
## [{{< fa brands python >}} Python]{.py}
[{{< fa brands python >}} Python]{.py} uses the `plot` method for this type of [variable]{.py}. Though note that without speciying the "column" argument we will get the map but every polygon will be filled with the default grey-blue color rather than informatively tied to another column in the data.
```{python py-vect-map}
# Make a map
vect_py.plot(column = "AREA")
```
:::