Skip to content

Commit 59dfafc

Browse files
committed
对观测值取对数后建模更加准确
1 parent df201cd commit 59dfafc

File tree

1 file changed

+8
-12
lines changed

1 file changed

+8
-12
lines changed

time-series-regression.qmd

Lines changed: 8 additions & 12 deletions
Original file line numberDiff line numberDiff line change
@@ -248,17 +248,17 @@ $$
248248
#| par: true
249249
250250
air_passengers_df <- data.frame(y = as.vector(AirPassengers), t = 1:144)
251-
fit_lm1 <- lm(y ~ t + sin(t / 12 * 2 * pi) + cos(t / 12 * 2 * pi), data = air_passengers_df)
251+
fit_lm1 <- lm(log(y) ~ t + sin(t / 12 * 2 * pi) + cos(t / 12 * 2 * pi), data = air_passengers_df)
252252
fit_lm2 <- update(fit_lm1, . ~ . +
253253
sin(t / 12 * 2 * 2 * pi) + cos(t / 12 * 2 * 2 * pi), data = air_passengers_df
254254
)
255255
fit_lm3 <- update(fit_lm2, . ~ . +
256256
sin(t / 12 * 3 * 2 * pi) + cos(t / 12 * 3 * 2 * pi), data = air_passengers_df
257257
)
258258
plot(y ~ t, air_passengers_df, type = "l")
259-
lines(x = air_passengers_df$t, y = fit_lm1$fitted.values, col = "red")
260-
lines(x = air_passengers_df$t, y = fit_lm2$fitted.values, col = "green")
261-
lines(x = air_passengers_df$t, y = fit_lm3$fitted.values, col = "orange")
259+
lines(x = air_passengers_df$t, y = exp(fit_lm1$fitted.values), col = "red")
260+
lines(x = air_passengers_df$t, y = exp(fit_lm2$fitted.values), col = "green")
261+
lines(x = air_passengers_df$t, y = exp(fit_lm3$fitted.values), col = "orange")
262262
```
263263

264264
模型 1 已经很好地捕捉到趋势和周期信息,当添加小周期后,略有改善,继续添加更多的小周期,不再有明显改善。实际上,小周期对应的回归系数也将不再显著。所以,这类模型的优化空间见顶了,需要进一步观察和利用残差的规律,使用更加复杂的模型。
@@ -267,10 +267,6 @@ lines(x = air_passengers_df$t, y = fit_lm3$fitted.values, col = "orange")
267267

268268
非线性趋势、多季节性(多个周期混合)、特殊节假日、突发热点事件、残差成分(平稳),能同时应对这五种情况的建模方法是贝叶斯可加模型和神经网络模型,比如基于 Stan 实现的 prophet 包和 tensorflow 框架。
269269

270-
::: callout-tip
271-
prophet 包是如何同时处理这些情况,是否可以在 cmdstanr 包中实现,是否可以在 mgcv 和 INLA 中实现?
272-
:::
273-
274270
```{r}
275271
library(cmdstanr)
276272
```
@@ -390,7 +386,7 @@ air_passengers_tbl <- data.frame(
390386
year = rep(1949:1960, each = 12),
391387
month = rep(1:12, times = 12)
392388
)
393-
mod1 <- gam(y ~ s(year) + s(month, bs = "cr", k = 12),
389+
mod1 <- gam(log(y) ~ s(year) + s(month, bs = "cr"),
394390
data = air_passengers_tbl, family = gaussian
395391
)
396392
summary(mod1)
@@ -420,15 +416,15 @@ plot(mod1, shade = TRUE)
420416
#| fig-height: 4
421417
#| par: true
422418
423-
air_passengers_ts <- ts(mod1$fitted.values, start = c(1949, 1), frequency = 12)
419+
air_passengers_ts <- ts(exp(mod1$fitted.values), start = c(1949, 1), frequency = 12)
424420
plot(AirPassengers)
425421
lines(air_passengers_ts, col = "red")
426422
```
427423

428424
整体上,乘客数逐年呈线性增长,每年不同月份呈现波动,淡季和旺季出行的流量有很大差异,近年来,这种差异的波动在扩大。为了刻画这种情况,考虑年度趋势和月度波动的交互作用。
429425

430426
```{r}
431-
mod2 <- gam(y ~ s(year, month), data = air_passengers_tbl, family = gaussian)
427+
mod2 <- gam(log(y) ~ s(year, month), data = air_passengers_tbl, family = gaussian)
432428
summary(mod2)
433429
```
434430

@@ -470,7 +466,7 @@ on.exit(par(op), add = TRUE)
470466
#| fig-height: 4
471467
#| par: true
472468
473-
air_passengers_ts <- ts(mod2$fitted.values, start = c(1949, 1), frequency = 12)
469+
air_passengers_ts <- ts(exp(mod2$fitted.values), start = c(1949, 1), frequency = 12)
474470
plot(AirPassengers)
475471
lines(air_passengers_ts, col = "red")
476472
```

0 commit comments

Comments
 (0)