-
Notifications
You must be signed in to change notification settings - Fork 2
/
Copy pathForecasting-4-1-Exp-Smoothing.Rmd
93 lines (66 loc) · 4.17 KB
/
Forecasting-4-1-Exp-Smoothing.Rmd
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
```{r echo=FALSE}
require(FishForecast)
```
# Exponential smoothing models
The basic idea with an exponential smoothing model is that your forecast of $x$ at time $t$ is a smoothed function of past $x$ values.
$$\hat{x}_{t} = \alpha x_{t-1} + \alpha (1-\alpha)^2 x_{t-2} + \alpha (1-\alpha)^3 x_{t-3} + \dots$$
Although this looks similar to an AR model with a constraint on the $\beta$ terms, it is fundamentally different. There is no process model and one is **not** assuming that
$$x_{t} = \alpha x_{t-1} + \alpha (1-\alpha)^2 x_{t-2} + \alpha (1-\alpha)^3 x_{t-3} + \dots + e_t$$
The goal is to find the $\alpha$ that minimizes $x_t - \hat{x}_t$, i.e. the forecast error. The issues regarding stationarity do not arise because we are not fitting a stationary process model. We are not fitting a process model at all.
## Overview
### Naive model
Let's start with a simple example, an exponential smoothing model with $\alpha=1$. This is called the Naive model:
$$\hat{x}_{t} = x_{t-1}$$
For the naive model, our forecast is simply the value in the previous time step. For example, a naive forecast of the anchovy landings in 1988 is the anchovy landings in 1987.
$$\hat{x}_{1988} = x_{1987}$$
This is the same as saying that we put 100% of the 'weight' on the most recent value and no weight on any value prior to that.
$$\hat{x}_{1988} = 1 \times x_{1987} + 0 \times x_{1986} + 0 \times x_{1985} + \dots$$
Past values in the time series have information about the current state, but only the most recent past value.
```{r echo=FALSE}
plot(1987:1964, c(1,rep(0,23)), lwd=2, ylab="weight", xlab="", type="l")
title("weight put on past values for 1988 forecast")
```
We can fit this with `forecast::Arima()`.
```{r rwf.fit}
fit.rwf <- forecast::Arima(anchovy87ts, order=c(0,1,0))
fr.rwf <- forecast::forecast(fit.rwf, h=5)
```
Alternatively we can fit with `rwf()` or `naive()` which are shortcuts for the above lines. All fit the same model.
```{r rwf.fit2}
fr.rwf <- forecast::rwf(anchovy87ts, h=5)
fr.rwf <- forecast::naive(anchovy87ts, h=5)
```
A plot of the forecast shows the forecast and the prediction intervals.
```{r fig.height=4.5}
plot(fr.rwf)
```
### Exponential smoothing
The naive model is a bit extreme. Often the values prior to the last value also have some information about future states. But the 'information content' should decrease the farther in the past that we go.
```{r echo=FALSE}
alpha <- 0.5
plot(1987:1964, alpha*(1-alpha)^(1:24-1), lwd=2, ylab="weight", xlab="", type="l")
title("more weight put on more recent values\nfor 1988 forecast")
```
A *smoother* is another word for a filter, which in time series parlance means a weighted sum of sequential values in a time series:
$$w_1 x_t + w_2 x_{t-1} + w_3 x_{t-2} + \dots$$
An exponential smoother is a filter (weighted sum) where the weights decline exponentially (Figure \@ref(fig:ets.alpha)).
```{r ets.alpha, echo=FALSE, fig.cap="Weighting function for exponential smoothing filter. The shape is determined by $\alpha$."}
alpha <- 0.5
plot(-1:-24, alpha*(1-alpha)^(1:24-1), lwd=2, ylab="weight", xlab="lag", type="l")
title("weight on x at t + lag")
```
### Exponential smoothing model
A simple exponential smoothing model is like the naive model that just uses the last value to make the forecast, but instead of only using the last value it will use values farther in the past also. The weighting function falls off exponentially as shown above. Our goal when fitting an exponential smoothing model is to find the the $\alpha$, which determines the shape of the weighting function (Figure \@ref(fig:ets.alpha2)), that minimizes the forecast errors.
```{r ets.alpha2, echo=FALSE, fig.cap="The size of $\alpha$ determines how past values affect the forecast."}
alpha <- 1
wts <- alpha*(1-alpha)^(0:23)
plot(1987:1964, wts/sum(wts), lwd=2, ylab="weight", xlab="", type="l")
alpha <- 0.5
wts <- alpha*(1-alpha)^(0:23)
lines(1987:1964, wts/sum(wts), lwd=2, col="blue")
alpha <- 0.05
wts <- alpha*(1-alpha)^(0:23)
lines(1987:1964, wts/sum(wts), lwd=2, col="red")
legend("topleft", c("alpha=1 like naive","alpha=0.5","alpha=0.05 like mean"),lwd=2, col=c("black","blue","red"))
title("more weight put on more recent values\nfor 1988 forecast")
```