generated from allisonhorst/meds-distill-template
-
Notifications
You must be signed in to change notification settings - Fork 5
/
Copy pathlab1d_sdm-evaluate.Rmd
259 lines (217 loc) · 22.9 KB
/
lab1d_sdm-evaluate.Rmd
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
---
title: 'Lab 1d. Species Distribution Modeling - Evaluate Models'
editor_options:
chunk_output_type: console
---
# Learning Objectives {.unnumbered}
Now you'll complete the modeling workflow with the steps to **evaluate** model performance and **calibrate** model parameters.
```{r fig.cap="Full model workflow with calibrate and evaluate steps emphasized.", echo=F}
# [model-workflow_evaluate-calibrate - Google Drawings](https://docs.google.com/drawings/d/10hr0L_Iu_2DleLHzQmQWJGocG87yRbSX8jHne7fiZr0/edit)
knitr::include_graphics("https://docs.google.com/drawings/d/e/2PACX-1vSO_TOz2eZ_qmdIrsrXS1If82bUv1Sno1U5bJ2y8GD7S1PhrrSYMeKCxKy6GtSZp6NC01tBZfUnXPdD/pub?w=933&h=354")
```
# Setup
```{r setup}
# global knitr chunk options
knitr::opts_chunk$set(
warning = FALSE,
message = FALSE)
# load packages
librarian::shelf(
dismo, # species distribution modeling: maxent(), predict(), evaluate(),
dplyr, ggplot2, GGally, here, maptools, readr,
raster, readr, rsample, sf,
usdm) # uncertainty analysis for species distribution models: vifcor()
select = dplyr::select
# options
set.seed(42)
options(
scipen = 999,
readr.show_col_types = F)
ggplot2::theme_set(ggplot2::theme_light())
# paths
dir_data <- here("data/sdm")
pts_geo <- file.path(dir_data, "pts.geojson")
env_stack_grd <- file.path(dir_data, "env_stack.grd")
mdl_maxv_rds <- file.path(dir_data, "mdl_maxent_vif.rds")
# read points of observation: presence (1) and absence (0)
pts <- read_sf(pts_geo)
# read raster stack of environment
env_stack <- raster::stack(env_stack_grd)
```
## Split observations into training and testing
```{r dt-data-prereq, echo=TRUE}
# create training set with 80% of full data
pts_split <- rsample::initial_split(
pts, prop = 0.8, strata = "present")
pts_train <- rsample::training(pts_split)
pts_test <- rsample::testing(pts_split)
pts_train_p <- pts_train %>%
filter(present == 1) %>%
as_Spatial()
pts_train_a <- pts_train %>%
filter(present == 0) %>%
as_Spatial()
```
# Calibrate: Model Selection
```{r}
# show pairs plot before multicollinearity reduction with vifcor()
pairs(env_stack)
```
```{r}
# calculate variance inflation factor per predictor, a metric of multicollinearity between variables
vif(env_stack)
# stepwise reduce predictors, based on a max correlation of 0.7 (max 1)
v <- vifcor(env_stack, th=0.7)
v
# reduce enviromental raster stack by
env_stack_v <- usdm::exclude(env_stack, v)
# show pairs plot after multicollinearity reduction with vifcor()
pairs(env_stack_v)
```
```{r}
# fit a maximum entropy model
if (!file.exists(mdl_maxv_rds)){
mdl_maxv <- maxent(env_stack_v, sf::as_Spatial(pts_train))
readr::write_rds(mdl_maxv, mdl_maxv_rds)
}
mdl_maxv <- read_rds(mdl_maxv_rds)
# plot variable contributions per predictor
plot(mdl_maxv)
# plot term plots
response(mdl_maxv)
# predict
y_maxv <- predict(env_stack, mdl_maxv) #, ext=ext, progress='')
plot(y_maxv, main='Maxent, raw prediction')
data(wrld_simpl, package="maptools")
plot(wrld_simpl, add=TRUE, border='dark grey')
```
# Evaluate: Model Performance
## Area Under the Curve (AUC), Reciever Operater Characteristic (ROC) Curve and Confusion Matrix
```{r}
pts_test_p <- pts_test %>%
filter(present == 1) %>%
as_Spatial()
pts_test_a <- pts_test %>%
filter(present == 0) %>%
as_Spatial()
y_maxv <- predict(mdl_maxv, env_stack)
#plot(y_maxv)
e <- dismo::evaluate(
p = pts_test_p,
a = pts_test_a,
model = mdl_maxv,
x = env_stack)
e
plot(e, 'ROC')
thr <- threshold(e)[['spec_sens']]
thr
p_true <- na.omit(raster::extract(y_maxv, pts_test_p) >= thr)
a_true <- na.omit(raster::extract(y_maxv, pts_test_a) < thr)
# (t)rue/(f)alse (p)ositive/(n)egative rates
tpr <- sum(p_true)/length(p_true)
fnr <- sum(!p_true)/length(p_true)
fpr <- sum(!a_true)/length(a_true)
tnr <- sum(a_true)/length(a_true)
matrix(
c(tpr, fnr,
fpr, tnr),
nrow=2, dimnames = list(
c("present_obs", "absent_obs"),
c("present_pred", "absent_pred")))
# add point to ROC plot
points(fpr, tpr, pch=23, bg="blue")
plot(y_maxv > thr)
```
# Lab1 Submission
To submit Lab 1, please submit the path on `taylor.bren.ucsb.edu` to your single consolidated Rmarkdown document (i.e. combined from separate parts) that you successfully knitted here:
- [Submit Lab 1. Species Distribution Modeling](https://forms.gle/u9ZC7rvkcQMojxv3A)
The path should start `/Users/` and end in `.Rmd`. Please be sure to have succesfully knitted to html, as in I should see the `.html` file there by the same name.
<!--
Copy/pasted into RStudio's Visual Editor from [lab1_pts - schedule | eds232-ml - Google Sheets](https://docs.google.com/spreadsheets/d/1lolwZ2CNAtUkTWPauyPEe_oVMknqS2yVvfnB1fiErzQ/edit#gid=1660146542)
-->
In your lab, please be sure to include:
+----+--------------+-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+--------+
| \# | Lab | Item | Points |
+====+==============+===========================================================================================================================================================================================+=======:+
| 1 | 1a.Explore | Name \_Genus species\_ used to search GBIF for observations. (0.5 pts) | 0.5 |
+----+--------------+-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+--------+
| 2 | 1a.Explore | Image of your species. I like to see wildlife. | 0.5 |
+----+--------------+-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+--------+
| 3 | 1a.Explore | Map of distribution of points | 0.5 |
+----+--------------+-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+--------+
| 4 | 1a.Explore | Question: How many observations total are in GBIF for your species? | 0.5 |
+----+--------------+-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+--------+
| 5 | 1a.Explore | Question: Did you have to perform any data cleaning steps? | 0.5 |
+----+--------------+-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+--------+
| 6 | 1a.Explore | Question: What environmental layers did you choose as predictors? Can you find any support for these in the literature? | 1 |
+----+--------------+-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+--------+
| 7 | 1a.Explore | Raster plot of environmental rasters clipped to species range | 0.5 |
+----+--------------+-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+--------+
| 8 | 1a.Explore | Map with pseudo-absence points | 0.5 |
+----+--------------+-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+--------+
| 9 | 1a.Explore | Term plots | 0.5 |
+----+--------------+-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+--------+
| 10 | 1b.Regress | Plot of ggpairs | 0.4 |
+----+--------------+-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+--------+
| 11 | 1b.Regress | Linear model output with range of values | 0.4 |
+----+--------------+-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+--------+
| 12 | 1b.Regress | Generalized Linear Model (GLM) output | 0.4 |
+----+--------------+-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+--------+
| 13 | 1b.Regress | GLM term plots | 0.3 |
+----+--------------+-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+--------+
| 14 | 1b.Regress | Generalized Additive Model (GAM) output | 0.3 |
+----+--------------+-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+--------+
| 15 | 1b.Regress | GAM term plots | 0.3 |
+----+--------------+-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+--------+
| 16 | 1b.Regress | Question: Which GAM environmental variables, and even range of values, seem to contribute most towards presence (above 0 response) versus absence (below 0 response)? | 1 |
+----+--------------+-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+--------+
| 17 | 1b.Regress | Maxent model output | 0.3 |
+----+--------------+-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+--------+
| 18 | 1b.Regress | Maxent variable contribution plot | 0.3 |
+----+--------------+-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+--------+
| 19 | 1b.Regress | Maxent term plots | 0.3 |
+----+--------------+-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+--------+
| 20 | 1b.Regress | Question: Which Maxent environmental variables, and even range of values, seem to contribute most towards presence (closer to 1 response) and how might this differ from the GAM results? | 1 |
+----+--------------+-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+--------+
| 21 | 1c.Trees | Tabular counts of 1 vs 0 before and after split | 0.4 |
+----+--------------+-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+--------+
| 22 | 1c.Trees | Rpart model output and plot, depth=1 | 0.4 |
+----+--------------+-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+--------+
| 23 | 1c.Trees | Rpart model output and plot, depth=default | 0.3 |
+----+--------------+-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+--------+
| 24 | 1c.Trees | Rpart complexity parameter plot | 0.3 |
+----+--------------+-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+--------+
| 25 | 1c.Trees | Question: Based on the complexity plot threshold, what size of tree is recommended? | 1 |
+----+--------------+-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+--------+
| 26 | 1c.Trees | Rpart variable importance plot | 0.3 |
+----+--------------+-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+--------+
| 27 | 1c.Trees | Question: what are the top 3 most important variables of your model? | 1 |
+----+--------------+-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+--------+
| 28 | 1c.Trees | RandomForest variable importance | 0.3 |
+----+--------------+-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+--------+
| 29 | 1c.Trees | Question: How might variable importance differ between rpart and RandomForest in your model outputs? | 1 |
+----+--------------+-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+--------+
| 30 | 1d.Evaluate | Plot of pairs from environmental stack | 0.3 |
+----+--------------+-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+--------+
| 31 | 1d.Evaluate | VIF per variable | 0.3 |
+----+--------------+-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+--------+
| 32 | 1d.Evaluate | Variables after VIF collinearity removal | 0.3 |
+----+--------------+-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+--------+
| 33 | 1d.Evaluate | Plof of pairs after VIF collinearity removal | 0.3 |
+----+--------------+-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+--------+
| 34 | 1d.Evaluate | Plot of variable contribution | 0.3 |
+----+--------------+-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+--------+
| 35 | 1d.Evaluate | Question: Which variables were removed due to multicollinearity and what is the rank of most to least important remaining variables in your model? | 1 |
+----+--------------+-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+--------+
| 36 | 1d.Evaluate | Maxent term plots | 0.4 |
+----+--------------+-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+--------+
| 37 | 1d.Evaluate | Map of Maxent prediction | 0.4 |
+----+--------------+-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+--------+
| 38 | 1d.Evaluate | ROC threshold value maximizing specificity and sensitivity | 0.4 |
+----+--------------+-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+--------+
| 39 | 1d.Evaluate | Confusion matrix with percentages | 0.5 |
+----+--------------+-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+--------+
| 40 | 1d.Evaluate | AUC plot | 0.4 |
+----+--------------+-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+--------+
| 41 | 1d.Evaluate | Map of binary habitat | 0.4 |
+----+--------------+-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+--------+