-
Notifications
You must be signed in to change notification settings - Fork 0
/
gp.Rmd
93 lines (76 loc) · 4.09 KB
/
gp.Rmd
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
---
title: "``gp``"
output: html_document
---
```{r setup, include=FALSE}
knitr::opts_chunk$set(echo = TRUE)
library(brms)
```
Set up Gaussian process terms in brms
#### Description
Function used to set up a Gaussian process term in brms. The function does not evaluate its </code></pre>
#### Arguments
– it exists purely to help set up a model with Gaussian process terms.
#### Usage
<pre><code>
gp(..., by = NA, cov = "exp_quad", gr = FALSE, scale = TRUE)
</code></pre>
#### Arguments
* ``...``: One or more predictors for the Gaussian process.
* ``by``: A numeric or factor variable of the same length as each predictor. In the numeric vector case, the elements multiply the values returned by the Gaussian process. In the factor variable case, a separate Gaussian process is fitted for each factor level.
* ``cov``: Name of the covariance kernel. By default, the exponentiated-quadratic kernel "``exp_quad``" is used.
* ``gr``: Logical; Indicates if auto-grouping should be used (defaults to FALSE). If enabled, observations sharing the same predictor values will be represented by the same latent variable in the Gaussian process. This will improve sampling efficiency drastically if the number of unique predictor combinations is small relative to the number of observations.
* ``scale``: Logical; If TRUE (the default), predictors are scaled so that the maximum Euclidean distance between two points is 1. Since the default prior on lscale expects scaled predictors, it is recommended tomanually specify priors on lscale, if scale is set to FALSE.
#### Examples
```{r}
## Not run:
# simulate data using the mgcv package
dat <- mgcv::gamSim(1, n = 30, scale = 2)
# fit a simple gaussian process model
fit1 <- brm(y ~ gp(x2), dat, chains = 2)
summary(fit1)
```
```{r}
me1 <- marginal_effects(fit1, nsamples = 200, spaghetti = TRUE)
plot(me1, ask = FALSE, points = TRUE)
# fit a more complicated gaussian process model
fit2 <- brm(y ~ gp(x0) + x1 + gp(x2) + x3, dat, chains = 2)
summary(fit2)
```
```{r}
me2 <- marginal_effects(fit2, nsamples = 200, spaghetti = TRUE)
plot(me2, ask = FALSE, points = TRUE)
# fit a multivariate gaussian process model
fit3 <- brm(y ~ gp(x1, x2), dat, chains = 2)
summary(fit3)
```
```{r}
me3 <- marginal_effects(fit3, nsamples = 200, spaghetti = TRUE)
plot(me3, ask = FALSE, points = TRUE)
# compare model fit
LOO(fit1, fit2, fit3)
# simulate data with a factor covariate
dat2 <- mgcv::gamSim(4, n = 90, scale = 2)
```
```{r}
# fit separate gaussian processes for different levels of 'fac'
fit4 <- brm(y ~ gp(x2, by = fac), dat2, chains = 2)
summary(fit4)
plot(marginal_effects(fit4), points = TRUE)
## End(Not run)
```
#### Details
A Gaussian process is a stochastic process, which describes the relation between one or more predictors $x = (x1; :::; xd)$ and a response f(x), where d is the number of predictors. A Gaussian process is the generalization of the multivariate normal distribution to an infinite number of dimensions.
Thus, it can be interpreted as a prior over functions. Any finite sample realized from this stochastic process is jointly multivariate normal, with a covariance matrix defined by the covariance kernel kp(x), where p is the vector of parameters of the Gaussian process:
f(x) MV N(0; kp(x))
The smoothness and general behavior of the function f depends only on the choice of covariance
kernel. For a more detailed introduction to Gaussian processes, see https://en.wikipedia.org/wiki/Gaussian_process.
Below, we describe the currently supported covariance kernels:
• "exp_quad": The exponentiated-quadratic kernel is defined as k(xi; xj) = sdgp2exp(jjxi
xj jj=(2lscale2)), where jj:jj is the Euclidean norm, sdgp is a standard deviation parameter, and lscale is characteristic length-scale parameter. The latter practically measures how close
two points xi and xj have to be to influence each other substantially.
In the current implementation, "exp_quad" is the only supported covariance kernel. More options will follow in the future.
#### Value
An object of class 'gpterm', which is a list of arguments to be interpreted by the formula parsing functions of brms.
See Also
brmsformula