You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
title: "Robust gradient-based MCMC with the Barker proposal"
3
3
output: rmarkdown::html_vignette
4
4
bibliography: references.bib
5
+
link-citations: true
5
6
vignette: >
6
7
%\VignetteIndexEntry{Robust gradient-based MCMC with the Barker proposal}
7
8
%\VignetteEngine{knitr::rmarkdown}
@@ -17,10 +18,9 @@ knitr::opts_chunk$set(
17
18
18
19
The `rmcmc` package provides a general-purpose implementation of the Barker proposal [@barker1965monte],
19
20
a gradient-based Markov chain Monte Carlo (MCMC) algorithm inspired by the Barker accept-reject rule,
20
-
proposed by @livingstone2022barker. This vignette demonstrates how to use the package to sample Markov chains
21
-
from a target distribution of interest, and illustrates the robustness to tuning that is a key advantage of
22
-
the Barker proposal compared to alternatives such as the Metropolis adjusted Langevin algorithm (MALA).
23
-
21
+
proposed by @livingstone2022barker.
22
+
This vignette demonstrates how to use the package to sample Markov chains from a target distribution of interest,
23
+
and illustrates the robustness to tuning that is a key advantage of the Barker proposal compared to alternatives such as the Metropolis adjusted Langevin algorithm (MALA).
24
24
25
25
```{r setup}
26
26
library(rmcmc)
@@ -29,7 +29,7 @@ library(rmcmc)
29
29
## Example target distribution
30
30
31
31
```{r}
32
-
dimension <- 3
32
+
dimension <- 10
33
33
scales <- c(0.01, rep(1, dimension - 1))
34
34
```
35
35
@@ -49,7 +49,7 @@ target_distribution <- list(
49
49
`rmcmc` provides implementations of several different proposal distributions which can be used within a Metropolis--Hastings based MCMC method:
50
50
51
51
-`barker_proposal`: The robust gradient-based Barker proposal proposed by @livingstone2022barker.
52
-
-`langevin_proposal`: A gradient-based proposal based on a discretization of a Langevin dynamics.
52
+
-`langevin_proposal`: A gradient-based proposal based on a discretization of Langevin dynamics.
53
53
-`random_walk_proposal`: A Gaussian random-walk proposal.
54
54
55
55
Each function requires the first argument to specify the target distribution the proposal is to be constructed for.
`rmcmc` has support for adaptively tuning parameters of the proposal distribution.
67
-
This is mediated by 'adapter' objects which define method for update the parameters of
68
-
a proposal based on the chain state and statistics recorded during a chain iteration.
67
+
This is mediated by 'adapter' objects which define method for update the parameters of a proposal based on the chain state and statistics recorded during a chain iteration.
69
68
Below we instantiate a list of adapters to
70
-
(i) adapt the scalar scale of the proposal distribution to coerce the average acceptance
71
-
probability of the chain transitions to a target value, and
72
-
(ii) adapt the shape of the proposal distribution with per-coordinate scaling factors
73
-
based on estimates on the coordinate-wise variances under the target distribution.
74
-
69
+
(i) adapt the scalar scale of the proposal distribution to coerce the average acceptance probability of the chain transitions to a target value, and
70
+
(ii) adapt the shape of the proposal distribution with per-coordinate scaling factors based on estimates on the coordinate-wise variances under the target distribution.
Here we set the initial scale to $2.38^2/(\text{dimension})^{\frac{1}{3}}$ following the results for MALA in @roberts2001optimal,
83
+
and set the target acceptance probability to 0.4 following the guideline in @livingstone2022barker.
84
+
This is equivalent to the default behaviour when not specifying the `initial_scale` and `target_accept_prob` arguments, in which case proposal and dimension dependent values following the guidelines in @roberts2001optimal and @livingstone2022barker will be used.
85
+
Both adapters have an optional `kappa` argument which can be used to set the decay rate exponent for the adaptation learning rate. We leave this as the default value of 0.6 (following the recommendation in @livingstone2022barker) in both cases.
86
+
83
87
The adapter updates will be applied only during an initial set of 'warm-up' chain iterations,
84
88
with the proposal parameters remaining fixed to their final adapted values during a subsequent
85
89
set of main chain iterations.
@@ -91,13 +95,17 @@ The `rmcmc` package encapsulates the chain state in a list which tracks the curr
91
95
and cached values of the log density and its gradient once computed once at the current position to avoid re-computation.
92
96
The `chain_state` function allows creation of a list of the required format,
93
97
with the first (and only required) argument specifying the position.
94
-
Here we generate an initial state with position coordinates sampled from a standard normal distribution.
98
+
Alternatively we can directly pass a vector specifying just the position component of the state to the `initial_state` argument of `sample_chain`.
99
+
Here we generate an initial state with position coordinates sampled from a independent normal distributions with standard deviation 10, following the example in @livingstone2022barker.
We now have everything needed to sample a Markov chain. To do this we use the `sample_chain` function from `rmcmc`. This requires us to specify the target distribution, proposal distribution, initial chain state, number of adaptive warm-up iterations and non-adaptive main chain iterations and list of adapters to use.
107
+
We now have everything needed to sample a Markov chain. To do this we use the `sample_chain` function from `rmcmc`.
108
+
This requires us to specify the target distribution, proposal distribution, initial chain state, number of adaptive warm-up iterations and non-adaptive main chain iterations and list of adapters to use.
101
109
102
110
```{r}
103
111
n_warm_up_iteration <- 10000
@@ -109,7 +117,7 @@ and `r n_main_iteration` main chain iterations.
109
117
We set `trace_warm_up` to `TRUE` to record statistics during the adaptive warm-up chain iterations.
110
118
111
119
```{r}
112
-
results <- sample_chain(
120
+
barker_results <- sample_chain(
113
121
target_distribution = target_distribution,
114
122
proposal = proposal,
115
123
initial_state = initial_state,
@@ -120,66 +128,242 @@ results <- sample_chain(
120
128
)
121
129
```
122
130
123
-
If the `progress` package is installed a progress bar will show the chain progress during sampling. The return value of `sample_chains` is a list containing fields for accessing the final chain state (which can be used to start sampling a new chain), any variables traced during the main chain iterations and any transition statistics recorded.
131
+
If the `progress` package is installed a progress bar will show the chain progress during sampling.
132
+
The return value of `sample_chains` is a list containing fields for accessing the final chain state (which can be used to start sampling a new chain), any variables traced during the main chain iterations and any additional statistics recorded during the main chain iterations.
133
+
If the `trace_warm_up` argument to `sample_chains` is set to `TRUE` as above, then the list returned by `sample_chains` will also contain entries `warm_up_traces` and `warm_up_statistics` corresponding to respectively the variable traces and additional statistics recorded during the warm-up iterations.
134
+
135
+
One of the additional statistics recorded is the acceptance probability for each chain iteration under the name `accept_prob`.
136
+
We can therefore compute the mean acceptance probability of the main chain iterations as follows:
cat(sprintf("Average acceptance probability is %.2f", mean_accept_prob))
141
+
```
142
+
143
+
This is close to the target acceptance rate of 0.4 indicating the scale adaptation worked as expected.
144
+
145
+
We can also inspect the shape parameter of the proposal to check the variance based shape adaptation succeeded.
146
+
The below snippet extracts the (first few dimensions of the) adapted shape from the `proposal` object and compares to the known true scales (per-coordinate standard deviations) of the target distribution.
147
+
148
+
```{r}
149
+
clipped_dimension <- min(5, dimension)
150
+
final_shape <- proposal$parameters()$shape
128
151
cat(
129
-
sprintf("Average acceptance probability is %.2f", mean_accept_prob),
Again adaptation appears to have been successful with the adapted shape close to the true target scales.
158
+
159
+
## Summarizing results using `posterior` package
160
+
161
+
The output from `sample_chains` can also be easily used with external packages for analyzing MCMC outputs.
162
+
For example the [`posterior` package](https://mc-stan.org/posterior/index.html) provides implementations of various inference diagnostic and functions for manipulating, subsetting and summarizing MCMC outputs.
163
+
164
+
```{r}
165
+
library(posterior)
166
+
```
167
+
168
+
The `traces` entry in the returned (list) output from `sample_chain` is a matrix with row corresponding to the chain iterations and (named) columns the traced variables. This matrix can be directly coerced to the `draws` data format the `posterior` package internally uses to represent chain outputs, and so can be passed directly to the [`summarize_draws` function](https://mc-stan.org/posterior/reference/draws_summary.html) to output a `tibble` data frame containing a set of summary statistics and diagnostic measures for each variable.
169
+
170
+
```{r}
171
+
summarize_draws(barker_results$traces)
172
+
```
173
+
174
+
We can also first explicit convert the `traces` matrix to a `posterior` draws object using the `as_draws_matrix` function.
175
+
This can be passed to the `summary` generic function to get an equivalent output
176
+
177
+
178
+
```{r}
179
+
draws <- as_draws_matrix(barker_results$traces)
180
+
summary(draws)
181
+
```
182
+
183
+
The draws object can also be manipulated and subsetted with various functions provided by `posterior`.
184
+
For example the [`extract_variable` function](https://mc-stan.org/posterior/reference/extract_variable.html) can be used to extract the draws for a specific named variable.
185
+
The output from this function can then be passed to the various diagnostic functions, for example to compute the effective sample size of the mean of the `target_log_density` variable we could do the following
186
+
187
+
```{r}
188
+
cat(
189
+
sprintf(
190
+
"Effective sample size of mean(target_log_density) is %.0f",
To sample a chain using a Langevin proposal, we can simple use `langevin_proposal` in place of `baker_proposal`.
199
+
200
+
Here we create a new set of adapters using the default arguments to `scale_adapter` which will set the target acceptance rate to the Langevin proposal specific value of 0.574 following the results in @roberts2001optimal.
0 commit comments