Skip to content

Commit e86f501

Browse files
authored
Expand vignette with more examples (#44)
* Expanding vignette with more examples * Remove unused variable in vignette function
1 parent 29cedd4 commit e86f501

File tree

3 files changed

+250
-55
lines changed

3 files changed

+250
-55
lines changed

DESCRIPTION

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -25,6 +25,7 @@ RoxygenNote: 7.3.2
2525
Suggests:
2626
bridgestan (>= 2.5.0),
2727
knitr,
28+
posterior,
2829
progress,
2930
ramcmc,
3031
rmarkdown,

vignettes/barker-proposal.Rmd

Lines changed: 238 additions & 54 deletions
Original file line numberDiff line numberDiff line change
@@ -2,6 +2,7 @@
22
title: "Robust gradient-based MCMC with the Barker proposal"
33
output: rmarkdown::html_vignette
44
bibliography: references.bib
5+
link-citations: true
56
vignette: >
67
%\VignetteIndexEntry{Robust gradient-based MCMC with the Barker proposal}
78
%\VignetteEngine{knitr::rmarkdown}
@@ -17,10 +18,9 @@ knitr::opts_chunk$set(
1718

1819
The `rmcmc` package provides a general-purpose implementation of the Barker proposal [@barker1965monte],
1920
a gradient-based Markov chain Monte Carlo (MCMC) algorithm inspired by the Barker accept-reject rule,
20-
proposed by @livingstone2022barker. This vignette demonstrates how to use the package to sample Markov chains
21-
from a target distribution of interest, and illustrates the robustness to tuning that is a key advantage of
22-
the Barker proposal compared to alternatives such as the Metropolis adjusted Langevin algorithm (MALA).
23-
21+
proposed by @livingstone2022barker.
22+
This vignette demonstrates how to use the package to sample Markov chains from a target distribution of interest,
23+
and illustrates the robustness to tuning that is a key advantage of the Barker proposal compared to alternatives such as the Metropolis adjusted Langevin algorithm (MALA).
2424

2525
```{r setup}
2626
library(rmcmc)
@@ -29,7 +29,7 @@ library(rmcmc)
2929
## Example target distribution
3030

3131
```{r}
32-
dimension <- 3
32+
dimension <- 10
3333
scales <- c(0.01, rep(1, dimension - 1))
3434
```
3535

@@ -49,7 +49,7 @@ target_distribution <- list(
4949
`rmcmc` provides implementations of several different proposal distributions which can be used within a Metropolis--Hastings based MCMC method:
5050

5151
- `barker_proposal`: The robust gradient-based Barker proposal proposed by @livingstone2022barker.
52-
- `langevin_proposal`: A gradient-based proposal based on a discretization of a Langevin dynamics.
52+
- `langevin_proposal`: A gradient-based proposal based on a discretization of Langevin dynamics.
5353
- `random_walk_proposal`: A Gaussian random-walk proposal.
5454

5555
Each function requires the first argument to specify the target distribution the proposal is to be constructed for.
@@ -64,22 +64,26 @@ proposal <- barker_proposal(target_distribution)
6464
## Setting up adaptation of tuning parameters
6565

6666
`rmcmc` has support for adaptively tuning parameters of the proposal distribution.
67-
This is mediated by 'adapter' objects which define method for update the parameters of
68-
a proposal based on the chain state and statistics recorded during a chain iteration.
67+
This is mediated by 'adapter' objects which define method for update the parameters of a proposal based on the chain state and statistics recorded during a chain iteration.
6968
Below we instantiate a list of adapters to
70-
(i) adapt the scalar scale of the proposal distribution to coerce the average acceptance
71-
probability of the chain transitions to a target value, and
72-
(ii) adapt the shape of the proposal distribution with per-coordinate scaling factors
73-
based on estimates on the coordinate-wise variances under the target distribution.
74-
69+
(i) adapt the scalar scale of the proposal distribution to coerce the average acceptance probability of the chain transitions to a target value, and
70+
(ii) adapt the shape of the proposal distribution with per-coordinate scaling factors based on estimates on the coordinate-wise variances under the target distribution.
7571

7672
```{r}
7773
adapters <- list(
78-
scale_adapter(initial_scale = 1., target_accept_prob = 0.4),
74+
scale_adapter(
75+
initial_scale = 2.38^2 / dimension^(1 / 3),
76+
target_accept_prob = 0.4
77+
),
7978
variance_adapter()
8079
)
8180
```
8281

82+
Here we set the initial scale to $2.38^2/(\text{dimension})^{\frac{1}{3}}$ following the results for MALA in @roberts2001optimal,
83+
and set the target acceptance probability to 0.4 following the guideline in @livingstone2022barker.
84+
This is equivalent to the default behaviour when not specifying the `initial_scale` and `target_accept_prob` arguments, in which case proposal and dimension dependent values following the guidelines in @roberts2001optimal and @livingstone2022barker will be used.
85+
Both adapters have an optional `kappa` argument which can be used to set the decay rate exponent for the adaptation learning rate. We leave this as the default value of 0.6 (following the recommendation in @livingstone2022barker) in both cases.
86+
8387
The adapter updates will be applied only during an initial set of 'warm-up' chain iterations,
8488
with the proposal parameters remaining fixed to their final adapted values during a subsequent
8589
set of main chain iterations.
@@ -91,13 +95,17 @@ The `rmcmc` package encapsulates the chain state in a list which tracks the curr
9195
and cached values of the log density and its gradient once computed once at the current position to avoid re-computation.
9296
The `chain_state` function allows creation of a list of the required format,
9397
with the first (and only required) argument specifying the position.
94-
Here we generate an initial state with position coordinates sampled from a standard normal distribution.
98+
Alternatively we can directly pass a vector specifying just the position component of the state to the `initial_state` argument of `sample_chain`.
99+
Here we generate an initial state with position coordinates sampled from a independent normal distributions with standard deviation 10, following the example in @livingstone2022barker.
100+
For reproducibility we also fix the random seed.
95101

96102
```{r}
97-
initial_state <- chain_state(rnorm(dimension))
103+
set.seed(791285301L)
104+
initial_state <- chain_state(10 * rnorm(dimension))
98105
```
99106

100-
We now have everything needed to sample a Markov chain. To do this we use the `sample_chain` function from `rmcmc`. This requires us to specify the target distribution, proposal distribution, initial chain state, number of adaptive warm-up iterations and non-adaptive main chain iterations and list of adapters to use.
107+
We now have everything needed to sample a Markov chain. To do this we use the `sample_chain` function from `rmcmc`.
108+
This requires us to specify the target distribution, proposal distribution, initial chain state, number of adaptive warm-up iterations and non-adaptive main chain iterations and list of adapters to use.
101109

102110
```{r}
103111
n_warm_up_iteration <- 10000
@@ -109,7 +117,7 @@ and `r n_main_iteration` main chain iterations.
109117
We set `trace_warm_up` to `TRUE` to record statistics during the adaptive warm-up chain iterations.
110118

111119
```{r}
112-
results <- sample_chain(
120+
barker_results <- sample_chain(
113121
target_distribution = target_distribution,
114122
proposal = proposal,
115123
initial_state = initial_state,
@@ -120,66 +128,242 @@ results <- sample_chain(
120128
)
121129
```
122130

123-
If the `progress` package is installed a progress bar will show the chain progress during sampling. The return value of `sample_chains` is a list containing fields for accessing the final chain state (which can be used to start sampling a new chain), any variables traced during the main chain iterations and any transition statistics recorded.
131+
If the `progress` package is installed a progress bar will show the chain progress during sampling.
132+
The return value of `sample_chains` is a list containing fields for accessing the final chain state (which can be used to start sampling a new chain), any variables traced during the main chain iterations and any additional statistics recorded during the main chain iterations.
133+
If the `trace_warm_up` argument to `sample_chains` is set to `TRUE` as above, then the list returned by `sample_chains` will also contain entries `warm_up_traces` and `warm_up_statistics` corresponding to respectively the variable traces and additional statistics recorded during the warm-up iterations.
134+
135+
One of the additional statistics recorded is the acceptance probability for each chain iteration under the name `accept_prob`.
136+
We can therefore compute the mean acceptance probability of the main chain iterations as follows:
124137

125138
```{r}
126-
mean_accept_prob <- mean(results$statistics[, "accept_prob"])
127-
adapted_shape <- proposal$parameters()$shape
139+
mean_accept_prob <- mean(barker_results$statistics[, "accept_prob"])
140+
cat(sprintf("Average acceptance probability is %.2f", mean_accept_prob))
141+
```
142+
143+
This is close to the target acceptance rate of 0.4 indicating the scale adaptation worked as expected.
144+
145+
We can also inspect the shape parameter of the proposal to check the variance based shape adaptation succeeded.
146+
The below snippet extracts the (first few dimensions of the) adapted shape from the `proposal` object and compares to the known true scales (per-coordinate standard deviations) of the target distribution.
147+
148+
```{r}
149+
clipped_dimension <- min(5, dimension)
150+
final_shape <- proposal$parameters()$shape
128151
cat(
129-
sprintf("Average acceptance probability is %.2f", mean_accept_prob),
130-
sprintf("True target scales: %s", toString(scales)),
131-
sprintf("Adapter scale est.: %s", toString(adapted_shape)),
152+
sprintf("Adapter scale est.: %s", toString(final_shape[1:clipped_dimension])),
153+
sprintf("True target scales: %s", toString(scales[1:clipped_dimension])),
132154
sep = "\n"
133155
)
134156
```
157+
Again adaptation appears to have been successful with the adapted shape close to the true target scales.
158+
159+
## Summarizing results using `posterior` package
160+
161+
The output from `sample_chains` can also be easily used with external packages for analyzing MCMC outputs.
162+
For example the [`posterior` package](https://mc-stan.org/posterior/index.html) provides implementations of various inference diagnostic and functions for manipulating, subsetting and summarizing MCMC outputs.
163+
164+
```{r}
165+
library(posterior)
166+
```
167+
168+
The `traces` entry in the returned (list) output from `sample_chain` is a matrix with row corresponding to the chain iterations and (named) columns the traced variables. This matrix can be directly coerced to the `draws` data format the `posterior` package internally uses to represent chain outputs, and so can be passed directly to the [`summarize_draws` function](https://mc-stan.org/posterior/reference/draws_summary.html) to output a `tibble` data frame containing a set of summary statistics and diagnostic measures for each variable.
169+
170+
```{r}
171+
summarize_draws(barker_results$traces)
172+
```
173+
174+
We can also first explicit convert the `traces` matrix to a `posterior` draws object using the `as_draws_matrix` function.
175+
This can be passed to the `summary` generic function to get an equivalent output
176+
177+
178+
```{r}
179+
draws <- as_draws_matrix(barker_results$traces)
180+
summary(draws)
181+
```
182+
183+
The draws object can also be manipulated and subsetted with various functions provided by `posterior`.
184+
For example the [`extract_variable` function](https://mc-stan.org/posterior/reference/extract_variable.html) can be used to extract the draws for a specific named variable.
185+
The output from this function can then be passed to the various diagnostic functions, for example to compute the effective sample size of the mean of the `target_log_density` variable we could do the following
186+
187+
```{r}
188+
cat(
189+
sprintf(
190+
"Effective sample size of mean(target_log_density) is %.0f",
191+
ess_mean(extract_variable(draws, "target_log_density"))
192+
)
193+
)
194+
```
195+
196+
## Sampling using a Langevin proposal
197+
198+
To sample a chain using a Langevin proposal, we can simple use `langevin_proposal` in place of `baker_proposal`.
199+
200+
Here we create a new set of adapters using the default arguments to `scale_adapter` which will set the target acceptance rate to the Langevin proposal specific value of 0.574 following the results in @roberts2001optimal.
201+
202+
```{r}
203+
mala_results <- sample_chain(
204+
target_distribution = target_distribution,
205+
proposal = langevin_proposal(target_distribution),
206+
initial_state = initial_state,
207+
n_warm_up_iteration = n_warm_up_iteration,
208+
n_main_iteration = n_main_iteration,
209+
adapters = list(scale_adapter(), variance_adapter()),
210+
trace_warm_up = TRUE
211+
)
212+
```
135213

136-
## Visualizing adaptation
214+
We can again check the average acceptance rate of the main chain iterations is close to the specified target value:
215+
216+
```{r}
217+
cat(
218+
sprintf(
219+
"Average acceptance probability is %.2f",
220+
mean(mala_results$statistics[, "accept_prob"])
221+
)
222+
)
223+
```
224+
225+
and use the `ess_mean` function from the `posterior` package to compute the effective sample size of the mean of the `target_log_density` variable
226+
227+
```{r}
228+
cat(
229+
sprintf(
230+
"Effective sample size of mean(target_log_density) is %.0f",
231+
ess_mean(
232+
extract_variable(
233+
as_draws_matrix(mala_results$traces), "target_log_density"
234+
)
235+
)
236+
)
237+
)
238+
```
239+
240+
## Comparing adaptation using Barker and Langevin proposal
137241

138242
We can plot how the proposal shape and scale parameters varied during the adaptive warm-up iterations,
139-
by accessing the statistics recorded in the `warm_up_statistics` field with the `results` object.
243+
by accessing the statistics recorded in the `warm_up_statistics` entry in the list returned by `sample_chain`.
244+
245+
```{r}
246+
visualize_scale_adaptation <- function(warm_up_statistics, label) {
247+
n_warm_up_iteration <- nrow(warm_up_statistics)
248+
par(mfrow = c(1, 2))
249+
plot(
250+
exp(warm_up_statistics[, "log_scale"]),
251+
type = "l",
252+
xlab = expression(paste("Chain iteration ", t)),
253+
ylab = expression(paste("Scale ", sigma[t]))
254+
)
255+
plot(
256+
cumsum(warm_up_statistics[, "accept_prob"]) / 1:n_warm_up_iteration,
257+
type = "l",
258+
xlab = expression(paste("Chain iteration ", t)),
259+
ylab = expression(paste("Average acceptance rate ", alpha[t])),
260+
ylim = c(0, 1)
261+
)
262+
mtext(
263+
sprintf("Scale adaptation for %s", label),
264+
side = 3, line = -2, outer = TRUE
265+
)
266+
}
267+
```
140268

141269
First considering the scalar scale parameter $\sigma_t$,
142270
which is controlled to achieve a target average acceptance rate,
143-
we see that the adaptation successfully coerces the average acceptance rate to be
271+
we see that for Barker proposal the adaptation successfully coerces the average acceptance rate to be
144272
close to the 0.4 target value and that the scale parameter adaptation has largely stabilized within
145273
the first 1000 iterations.
146274

147275
```{r fig.width=7, fig.height=4}
148-
par(mfrow = c(1, 2))
149-
plot(
150-
exp(results$warm_up_statistics[, "log_scale"]),
151-
type = "l",
152-
xlab = expression(paste("Chain iteration ", t)),
153-
ylab = expression(paste("Scale ", sigma[t]))
276+
visualize_scale_adaptation(barker_results$warm_up_statistics, "Barker proposal")
277+
```
278+
279+
For the Langevin proposal on the other hand,
280+
while the acceptance rate does eventually converge to its target value of 0.57,
281+
the convergence is slower and there is more evidence of unstable oscillatory behaviour in the adapted scale.
282+
283+
```{r fig.width=7, fig.height=4}
284+
visualize_scale_adaptation(mala_results$warm_up_statistics, "Langevin proposal")
285+
```
286+
287+
Now we consider the adaptation of the diagonal shape matrix $\Sigma_t$,
288+
based on estimates of the per-coordinate variances.
289+
290+
```{r}
291+
visualize_shape_adaptation <- function(warm_up_statistics, dimensions, label) {
292+
matplot(
293+
sqrt(warm_up_statistics[, paste0("variance_estimate", dimensions)]),
294+
type = "l",
295+
xlab = expression(paste("Chain iteration ", t)),
296+
ylab = expression(paste("Shape ", diag(Sigma[t]^(1 / 2)))),
297+
log = "y"
298+
)
299+
legend(
300+
"right",
301+
paste0("coordinate ", dimensions),
302+
lty = dimensions,
303+
col = dimensions,
304+
bty = "n"
305+
)
306+
mtext(
307+
sprintf("Shape adaptation for %s", label),
308+
side = 3, line = -2, outer = TRUE
309+
)
310+
}
311+
```
312+
313+
We see that the for the Barker proposal the adaptation quickly converges towards the known heterogeneous scales along the different coordinates.
314+
315+
```{r fig.width=7, fig.height=4}
316+
visualize_shape_adaptation(
317+
barker_results$warm_up_statistics, 1:clipped_dimension, "Barker proposal"
154318
)
155-
plot(
156-
cumsum(results$warm_up_statistics[, "accept_prob"]) / 1:n_warm_up_iteration,
157-
type = "l",
158-
xlab = expression(paste("Chain iteration ", t)),
159-
ylab = expression(paste("Average acceptance rate ", alpha[t])),
160-
ylim = c(0, 1)
319+
```
320+
321+
For the Langevin proposal, the shape adaptation is again slower.
322+
323+
```{r fig.width=7, fig.height=4}
324+
visualize_shape_adaptation(
325+
mala_results$warm_up_statistics, 1:clipped_dimension, "Langevin proposal"
161326
)
162327
```
163328

164-
Now consider the adaptation of the diagonal shape matrix $\Sigma_t$,
165-
based on estimates of the per-coordinate variances, we see that the adaptation
166-
converges towards the known heterogeneous scales along the different coordinates.
329+
We can also visualize the chain position components during the warm-up iterations using the `warm_up_traces` entry.
330+
331+
```{r}
332+
visualize_traces <- function(traces, dimensions, label) {
333+
matplot(
334+
traces[, paste0("position", dimensions)],
335+
type = "l",
336+
xlab = expression(paste("Chain iteration ", t)),
337+
ylab = expression(paste("Position ", X[t])),
338+
)
339+
legend(
340+
"topright",
341+
paste0("coordinate ", dimensions),
342+
lty = dimensions,
343+
col = dimensions,
344+
bty = "n"
345+
)
346+
mtext(sprintf("Traces for %s", label), side = 3, line = -2, outer = TRUE)
347+
}
348+
```
349+
350+
For the Barker proposal we can see the chain quickly appears to converge to a stationary regime
167351

168352
```{r fig.width=7, fig.height=4}
169-
matplot(
170-
sqrt(results$warm_up_statistics[, paste0("variance_estimate", 1:dimension)]),
171-
type = "l",
172-
xlab = expression(paste("Chain iteration ", t)),
173-
ylab = expression(paste("Shape ", diag(Sigma[t]^(1 / 2)))),
174-
log = "y"
353+
visualize_traces(
354+
barker_results$warm_up_traces, 1:clipped_dimension, "Barker proposal"
175355
)
176-
legend(
177-
"right",
178-
paste0("coordinate ", 1:dimension),
179-
lty = 1:dimension,
180-
col = 1:dimension,
181-
bty = "n"
356+
```
357+
358+
The Langevin proposal does also appear to converge to a stationary regime but again convergence is slower
359+
360+
```{r fig.width=7, fig.height=4}
361+
visualize_traces(
362+
mala_results$warm_up_traces, 1:clipped_dimension, "Langevin proposal"
182363
)
183364
```
184365

366+
Overall we see that while the Langevin proposal is able to achieve a higher sampling efficiency when tuned with appropriate parameters,
367+
its performance is more sensitive to the tuning parameter values resulting in less stable and robust adaptive tuning.
368+
185369
## References

0 commit comments

Comments
 (0)