-
Notifications
You must be signed in to change notification settings - Fork 5
/
Copy pathForecasting 2-2 - TV Regression.Rmd
187 lines (127 loc) · 5.34 KB
/
Forecasting 2-2 - TV Regression.Rmd
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
---
output:
xaringan::moon_reader:
css: "my-theme.css"
lib_dir: libs
nature:
highlightStyle: github
highlightLines: true
---
layout: true
.hheader[<a href="index.html">`r fontawesome::fa("home", fill = "steelblue")`</a>]
---
```{r setup, include=FALSE, message=FALSE}
options(htmltools.dir.version = FALSE, servr.daemon = TRUE)
library(huxtable)
library(ggplot2)
library(forecast)
```
class: center, middle, inverse
# Forecasting Time Series
## Time-varying Regression: Forecasting
.futnote[Eli Holmes, UW SAFS]
.citation[[email protected]]
---
```{r load_data_TV_Regression_Forecasting, echo=FALSE}
load("landings.RData")
landings$log.metric.tons = log(landings$metric.tons)
landings = subset(landings, Year <= 1989)
landings$t = landings$Year-landings$Year[1]
anchovy = subset(landings, Species=="Anchovy" & Year <= 1987)
sardine = subset(landings, Species=="Sardine" & Year <= 1987)
```
Forecasting is easy in R once you have a fitted model.
Let's say for the anchovy, we fit the model
$$C_t = \alpha + \beta t + e_t$$
where $t$ starts at 0 (so 1964 is $t=0$ ). To predict, predict the catch in year t, we use
$$C_t = \alpha + \beta t + e_t$$
---
Model fit:
```{r tvreg.anchovy2}
model <- lm(log.metric.tons ~ t, data=anchovy)
coef(model)
```
For anchovy, the estimated $\alpha$ (Intercept) is `r coef(model)[1]` and $\beta$ is `r coef(model)[2]`. We want to use these estimates to forecast 1988 ( $t=24$ ).
So the 1988 forecast is `r coef(model)[1]` + `r coef(model)[2]` $\times$ 24 :
```{r tvreg.forecast1}
coef(model)[1]+coef(model)[2]*24
```
log metric tons.
---
# The forecast package
The forecast package in R makes it easy to create forecasts with fitted models and to plot (some of) those forecasts.
For a TV Regression model, our `forecast()` call looks like
```{r TVregression.forecast2}
library(forecast)
fr <- forecast(model, newdata = data.frame(t=24:28))
```
---
The dark grey bands are the 80% prediction intervals and the light grey are the 95% prediction intervals.
```{r plot.TVreg.forecast, fig.align = "center"}
plot(fr)
```
---
Sardine forecasts from a higher order polynomial can similarly be made. Let's fit a 4-th order polynomial.
$$C_t = \alpha + \beta_1 t + \beta_2 t^2 + \beta_3 t^3 + \beta_4 t^4 + e_t$$
To forecast with this model, we fit the model to estimate the $\beta$'s and then replace $t$ with $24$:
$$C_{1988} = \alpha + \beta_1 24 + \beta_2 24^2 + \beta_3 24^3 + \beta_4 24^4 + e_t$$
---
This is how to do that in R:
```{r tvreg.sardine2}
model <- lm(log.metric.tons ~ t + I(t^2) + I(t^3) + I(t^4), data=anchovy)
fr <- forecast(model, newdata = data.frame(t=24:28))
fr
```
---
Unfortunately, forecast does not recognize that there is only one predictor $t$ and we cannot use forecast's plot function.
If you do this in R, it throws an error.
```{r plot.TVreg.forecast.bad}
try(plot(fr))
```
```
Error in plotlmforecast(x, PI = PI, shaded = shaded, shadecols = shadecols, : Forecast plot for regression models only available for a single predictor
```
---
```{r plot.TVreg.func, echo=FALSE}
plotforecasttv <- function(object, h=10, ylims=NULL){
dat <- object$model
dat$fitted <- object$fitted.values
tlim <- (max(object$model$t)+1:h)
pr95 <- predict(model, newdata = data.frame(t=(max(object$model$t)+1:h)), level=0.95, interval="prediction")
pr80 <- predict(model, newdata = data.frame(t=(max(object$model$t)+1:h)), level=0.80, interval="prediction")
pr95 <- as.data.frame(pr95); pr95$t <- tlim
pr80 <- as.data.frame(pr80); pr80$t <- tlim
if(is.null(ylims)) ylims <- c(min(dat[,1],pr95$lwr, pr80$lwr), max(dat[,1],pr95$upr, pr80$upr))
p1 <- ggplot(dat, aes_string(x = colnames(dat)[2], y = colnames(dat)[1])) +
theme_bw() +
geom_point(color = "blue") + xlim(0,max(tlim)) + ylim(ylims) +
geom_line(aes(x=t, y=fitted), color="red")
p1 +
geom_ribbon(mapping=aes(x=t, ymin=lwr, ymax=upr), data=pr95, inherit.aes=FALSE, fill = "grey50") +
geom_ribbon(mapping=aes(x=t, ymin=lwr, ymax=upr), data=pr80, inherit.aes=FALSE, fill = "grey75") +
geom_line(aes(x=t, y=fit), pr95)
}
```
I created a function that you can use to plot time-varying regressions with polynomial $t$. You will use this function in the lab.
```{r plot.TVreg.forecast2, fig.align = "center"}
plotforecasttv(model, ylims=c(8,17))
```
---
A feature of a time-varying regression with many polynomials is that it fits the data well, but the forecast quickly becomes uncertain due to uncertainty regarding the polynomial fit. A simpler model can give forecasts that do not become rapidly uncertain.
The flip-side is that the simpler model may not capture the short-term trends very well and may suffer from autocorrelated residuals.
```{r tvreg.lm1}
model <- lm(log.metric.tons ~ t + I(t^2), data=sardine)
```
---
```{r plot.TVreg.lm1, fig.align = "center"}
plotforecasttv(model, ylims=c(8,17))
```
---
# Summary
* Time-varying regression is a simple approach to forecasting that allows a non-linear trend.
* The uncertainty in your forecast is determined by how much error there is between the fit an the data.
* Fit must be balanced against prediction uncertainty.
* R allows you to quickly fit models and compute the prediction intervals.
Careful thought must be given to selecting the polynomial order.
* Standard methods are available in R for order selection
* Using different orders for different data sets has prediction consequences