-
Notifications
You must be signed in to change notification settings - Fork 0
/
analyze-areal-data.qmd
248 lines (196 loc) · 6.91 KB
/
analyze-areal-data.qmd
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
# 区域数据分析 {#sec-analyze-areal-data}
## 苏格兰唇癌数据分析 {#sec-scotland-lip-cancer}
> Everything is related to everything else, but near things are more related than distant things.
>
> --- Waldo Tobler [@Tobler1970]
::: {#spatial-areal-data .callout-note title="空间区域数据分析"}
空间区域数据的贝叶斯建模
- [Exact sparse CAR models in Stan](https://github.com/mbjoseph/CARstan) [网页文档](https://mc-stan.org/users/documentation/case-studies/mbjoseph-CARStan.html)
- [Spatial Models in Stan: Intrinsic Auto-Regressive Models for Areal Data](https://github.com/stan-dev/example-models/tree/master/knitr/car-iar-poisson) [网页文档](https://mc-stan.org/users/documentation/case-studies/icar_stan.html) 原始数据和代码,接上面苏格兰唇癌数据分析,用 CmdStanR 更新后的[代码](https://github.com/stan-dev/example-models/tree/master/knitr/car-iar-poisson)
- [Spatial modeling of areal data. Lip cancer in Scotland](https://www.paulamoraga.com/book-geospatial/sec-arealdataexamplespatial.html) INLA 建模
:::
记录 1975-1986 年苏格兰 56 个地区的唇癌病例数,这是一个按地区汇总的数据。
```{r}
#| message: false
library(sf)
scotlips <- st_read('data/scotland/scotland.shp', crs = st_crs("EPSG:27700"))
scotlips
```
- Observed 观测到的患唇癌的案例数
- Expected 预期患唇癌的案例数
- pcaff: 从事农业、渔业和林业的人口比例(单位:百分比)。
```{r}
#| label: fig-scotlips-map
#| fig-cap: 苏格兰各地区唇癌病例数分布
#| fig-width: 5
#| fig-height: 5
#| fig-showtext: true
library(ggplot2)
ggplot() +
geom_sf(data = scotlips, aes(fill = Observed)) +
scale_fill_viridis_c(option = "plasma", na.value = "white") +
theme_minimal()
```
### CAR 模型
**brms** 包的函数 `car()` 构建 CAR 模型,详情见帮助文档 `?car`。
| 缩写 | 全称 | 实现 |
|--------|---------------------------------------|------|
| car | conditional auto-regressive | \- |
| icar | intrinsic CAR | brms |
| iar | intrinsic auto-regressive @morris2019 | \- |
| escar | exact sparse CAR | brms |
| esicar | exact sparse intrinsic CAR | brms |
| bym2 | Besag York Mollié 2 @morris2019 | brms |
```{r}
#| message: false
# 构造邻接矩阵
library(spdep)
# 根据距离确定近邻关系
scot_nb <- spdep::poly2nb(scotlips, row.names = scotlips$SP_ID)
# 创建邻接矩阵 W
W <- nb2mat(neighbours = scot_nb, style = "B", zero.policy = TRUE)
library(brms)
# 数据变换
scotlips$pcaff2 <- 0.1 * scotlips$pcaff
# 拟合模型
scot_fit_icar <- brm(
Observed ~ offset(log(Expected)) + pcaff2 + car(W, gr = SP_ID, type = "icar"),
data = scotlips, data2 = list(W = W), family = poisson(link = "log"),
refresh = 0, seed = 20232023
)
# 输出结果
summary(scot_fit_icar)
```
后验线性预测
```{r}
scotlips$RR <- colMeans(posterior_linpred(scot_fit_icar))
# 相对风险的对数
summary(scotlips$RR)
```
模型的 LOO 值
```{r}
loo(scot_fit_icar)
```
后验线性预测的拟合值
```{r}
#| label: fig-scotlips-pred-icar
#| fig-cap: 苏格兰各地区唇癌病相对风险(对数尺度)
#| fig-width: 5
#| fig-height: 5
#| fig-showtext: true
ggplot() +
geom_sf(data = scotlips, aes(fill = RR)) +
scale_fill_viridis_c(option = "plasma", na.value = "white") +
theme_minimal()
```
### BYM2 模型
响应变量服从泊松分布
- BYM-INLA [@blangiardo2013; @moraga2020]
- BYM-Stan [@morris2019; @donegan2022; @cabral2022]
将 ICAR 模型更新为 BYM2 模型
```{r}
#| message: false
# 拟合模型
scot_fit_bym2 <- brm(
Observed ~ offset(log(Expected)) + pcaff2 + car(W, gr = SP_ID, type = "bym2"),
data = scotlips, data2 = list(W = W), family = poisson(link = "log"),
refresh = 0, seed = 20232023
)
# 输出结果
summary(scot_fit_bym2)
```
`rhocar` 表示 CAR 先验中的参数 $\rho$
`sdcar` 表示 CAR 先验中的参数 $\sigma$
```{r}
loo(scot_fit_bym2)
```
预测值
```{r}
scotlips$RR <- colMeans(posterior_linpred(scot_fit_bym2))
# 相对风险的对数
summary(scotlips$RR)
```
```{r}
#| label: fig-scotlips-pred-bym2
#| fig-cap: 苏格兰各地区唇癌病相对风险(对数尺度)
#| fig-width: 5
#| fig-height: 5
#| fig-showtext: true
ggplot() +
geom_sf(data = scotlips, aes(fill = RR)) +
scale_fill_viridis_c(option = "plasma", na.value = "white") +
theme_minimal()
```
## 美国各州犯罪率分析
响应变量服从高斯分布的调查数据 [@bivand2001]
数据集 USArrests 记录 1973 年美国各州每 10 万居民中因谋杀 Murder、袭击 Assault 和强奸 Rape 被警察逮捕的人数以及城市人口所占百分比(可以看作城市化率)。
```{r}
#| echo: false
#| label: tbl-us-arrests
#| tbl-cap: "数据集 USArrests(部分)"
us_arrests <- data.frame(
state_name = rownames(USArrests),
state_region = state.region,
USArrests, check.names = FALSE
)
knitr::kable(head(us_arrests), col.names = c(
"州名", "区域划分", "谋杀犯", "袭击犯", "城市化率", "强奸犯"
), row.names = FALSE)
```
```{r}
#| label: fig-us-arrests-sf
#| fig-cap: 因袭击被逮捕的人数分布
#| fig-showtext: true
#| fig-width: 7
#| fig-height: 4
library(sf)
# 州数据
us_state_sf <- readRDS("data/us-state-map-2010.rds")
# 观测数据
us_state_df <- merge(x = us_state_sf, y = us_arrests,
by.x = "NAME", by.y = "state_name", all.x = TRUE)
ggplot() +
geom_sf(
data = us_state_df, aes(fill = Assault), color = "gray80", lwd = 0.25) +
scale_fill_viridis_c(option = "plasma", na.value = "white") +
theme_void()
```
1973 年美国各州因袭击被逮捕的人数与城市化率的关系:相关分析
```{r}
#| label: fig-us-arrests-point
#| fig-cap: 逮捕人数比例与城市化率的关系
#| fig-width: 7
#| fig-height: 5.5
#| code-fold: true
#| echo: !expr knitr::is_html_output()
#| fig-showtext: true
library(ggrepel)
ggplot(data = us_arrests, aes(x = UrbanPop, y = Assault)) +
geom_point(aes(color = state_region)) +
geom_text_repel(aes(label = state_name), size = 3, seed = 2022) +
theme_classic() +
labs(x = "城市化率(%)", y = "因袭击被逮捕人数", color = "区域划分")
```
阿拉斯加州和夏威夷州与其它州都不相连,属于孤立的情况,下面在空间相关性的分析中排除这两个州。
```{r}
# 州的中心
centers48 <- subset(
x = data.frame(x = state.center$x, y = state.center$y),
subset = !state.name %in% c("Alaska", "Hawaii")
)
# 观测数据
arrests48 <- subset(
x = USArrests, subset = !rownames(USArrests) %in% c("Alaska", "Hawaii")
)
```
```{r}
#| message: false
library(spData)
library(spdep)
# KNN K-近邻方法获取邻接矩阵
k4.48 <- knn2nb(knearneigh(as.matrix(centers48), k = 4))
# Moran I test
moran.test(x = arrests48$Assault, listw = nb2listw(k4.48))
# Permutation test for Moran's I statistic
moran.mc(x = arrests48$Assault, listw = nb2listw(k4.48), nsim = 499)
```