-
Notifications
You must be signed in to change notification settings - Fork 33
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add vignette on benchmarking model options #695
base: main
Are you sure you want to change the base?
Conversation
As an update, the Error when using library(EpiNow2)
library(cmdstanr)
# Set the number of cores to use
options(mc.cores = 4)
# Generation time
generation_time <- Gamma(
shape = Normal(1.3, 0.3),
rate = Normal(0.37, 0.09),
max = 14
)
# Incubation period
incubation_period <- LogNormal(
meanlog = Normal(1.6, 0.05),
sdlog = Normal(0.5, 0.05),
max = 14
)
# Reporting delay
reporting_delay <- LogNormal(
meanlog = 0.5,
sdlog = 0.5,
max = 10
)
# Combine the incubation period and reporting delay into one delay
delay <- incubation_period + reporting_delay
# Observation model options
obs <- obs_opts(
scale = list(mean = 0.1, sd = 0.025),
return_likelihood = TRUE
)
# Run model
epinow(
data = example_confirmed,
generation_time = generation_time_opts(generation_time),
delays = delay_opts(delay),
obs = obs,
horizon = 0,
rt = NULL,
stan = stan_opts(
method = "laplace",
backend = "cmdstanr"
)
)
Error in fit_model_approximate(args, id = id) :
Approximate inference failed due to: Error: 'jacobian' argument to optimize and laplace must match!
laplace was called with jacobian=TRUE
optimize was run with jacobian=TRUE Not an informative error message from After playing around with the example above with different combinations of Moreover, according to the documentation of |
I can't reproduce this - do you need to update My versions are: devtools::package_info("cmdstanr") |>
dplyr::filter(package == "cmdstanr")
#> package * version date (UTC) lib source
#> cmdstanr 0.8.1.9000 2024-06-23 [1] Github (stan-dev/cmdstanr@9878dda)
#>
#> [1] /Library/Frameworks/R.framework/Versions/4.4-arm64/Resources/library
cmdstanr::cmdstan_version()
#> [1] "2.35.0" Created on 2024-07-12 with reprex v2.1.0 |
Thanks. It's fixed now after updating. |
I have now pushed the vignette with the run of all the models using MCMC (a472439). To do:
Settings and errors so farPathfinder errors
|
I noticed something interesting about @sbfnk The only thing in the way of this vignette getting merged is the struggle to get |
From meeting with Seb today:
|
a6f38ba
to
fef8991
Compare
bfe0e4c
to
76e9161
Compare
2736b9c
to
b6e89ac
Compare
@jamesmbaazam Would it be possible to share a rendered html, since I don't think we've set anything up yet to post a preview of the articles as a comment (though we could) |
We usually pre-compile the vignettes from the |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Wow, this is a lot of great work!
General comment is that it has elements of a vignette and elements of a paper so it's not quite clear where it stands on the distinction between the two. I would say for a vignette it's a bit too complex defining functions calling other functions etc. - for a paper it definitely has some great new content but the style is more like a walkthrough/vignette.
With that in mind I have focused on code correctness / implementation in my review so far. We can discuss the text later perhaps.
Anyway, very cool stuff and really interesting!
vignettes/benchmarks.Rmd.orig
Outdated
- `infections_true`: the infections by date of infection, and | ||
- `reported_cases_true`: the reported cases by date of report. | ||
```{r extract-true-data} | ||
R_true <- forecast$summarised[variable == "R"]$median |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why do you do 10 samples and the median for R but not infections? I think for a fair comparison of the methods you'd just sample 1 and then take this noisy trajectory as truth data (possibly doing it 10 times overall but this requires 10 times the computation).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I will update but just to note that this is from the synthetic evaluation scripts so might have to update there too?
Also, just to understand, are some of the methods/models more sensitive to the number of samples?
vignettes/benchmarks.Rmd.orig
Outdated
|
||
These are the components of each model. | ||
```{r model-components,echo = FALSE} | ||
model_components <- dplyr::tribble( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
same as above for constructing this
if (!inherits(estimates, c("matrix"))) return(rep(NA_real_, length(truth))) | ||
# Assumes that the estimates object is structured with the samples as rows | ||
shortest_obs_length <- min(ncol(estimates), length(truth)) | ||
reduced_truth <- tail(truth, shortest_obs_length) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
shouldn't this be head
? I.e. your estimates correspond to the initial snapshot of the data (starting at 1) not the end (at least for now).
shortest_obs_length <- min(ncol(estimates), length(truth)) | ||
reduced_truth <- tail(truth, shortest_obs_length) | ||
estimates_transposed <- t(estimates) # transpose to have samples as columns | ||
reduced_estimates <- tail(estimates_transposed, shortest_obs_length) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Same here head
vs tail
log10(reduced_truth), | ||
log10(reduced_estimates) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
only works if not ever zero (so perhaps fine if it works)
estimates_transposed <- t(estimates) # transpose to have samples as columns | ||
reduced_estimates <- tail(estimates_transposed, shortest_obs_length) | ||
return( | ||
crps_sample( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
if you're only using crps_sample()
from scoringutils
would it make sense to use scoringRules
instead? As the scoringutils
version is just a thin wrapper.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
dude
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
(how are we going to goose downloads with that approach ;) )
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
More seriously I think this indicates we should just use more of the features of scoringutils
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yep, ultimately related also to #618
|
||
```{r process-crps} | ||
# Process CRPS for Rt | ||
rt_crps <- process_crps(results, "R", R) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I wonder if it would be good to look at the estimate on the day of the inference for a real-time measure and then e.g. the 7-day forecast rather than the whole "estimate from partial data" / "forecast" time series. Debatable perhaps but if you use the whole series I think it might make more sense to use the multivariate CRPS / energy score (which is not yet implemented in scoringutils
but is available in scoringRules::es_sample()
.
I think we can push this to the next release to give us a bit more time to review content and scope - no need to rush it into 1.7.0 (which is otherwise pretty much ready to go). |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@jamesmbaazam Mostly a lot of questions, feel free to ignore any of these as I am sure you have already discussed a lot of it.
Overall this is a really nice analysis. From an outside perspective, I am really interested in how the approximate methods compare in terms of forecast performance to MCMC. Also, it wasn't clear that these were evaluated only on the out-of-sample data, and if not, separately evaluating the out-of-sample forecast performance seems important (but I may have missed this).
vignettes/benchmarks.Rmd.orig
Outdated
# Rt prior | ||
rt_prior_default <- Normal(2, 0.1) | ||
# Number of cores | ||
options(mc.cores = 6) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Add a comment that you set to 6 bc your computer has 8 cores or something along those lines
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
or do some kind of core detection probably.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is addressed in the considerations section but happy to repeat it here. See 655148a.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think if we can avoid hard coding this it would be good.
R = R_noisy, | ||
samples = 10 | ||
) | ||
``` |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
From a very outside perspective, it strikes me as a little bit odd to do a fit to the example_confirmed
, and then create a separate true R(t), and then generate new simulated data, rather than just doing a forward simulation.
But I assume there are reasons for this choice that have already been discussed.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Aha we are basically being extremely clever and pragmatic but I agree we should tell people this
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The idea is that by using the posterior for everything but the Rt trajectory you are getting a simulation that very accurately reflects a real world setting whilst still being able to know and change the true Rt trajectory
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The idea is that by using the posterior for everything but the Rt trajectory you are getting a simulation that very accurately reflects a real world setting whilst still being able to know and change the true Rt trajectory
Stealing this explanation.
cases_traj | ||
``` | ||
|
||
![plot of chunk plot-cases](benchmarks-plot-cases-1.png) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This comes up in the knitted Rmd under the plot as plot of chunk plot-cases
, perhaps just want to label them as Figure. # with a caption?
``` | ||
|
||
![plot of chunk crps-plotting-rt-total](benchmarks-crps-plotting-rt-total-1.png) | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Might be easier to read if you made this into a bar chart since we were just looking at these as colored lines.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
or a point plot I am irationally anti bar
``` | ||
|
||
![plot of chunk plot-rt-tv-crps-approx](benchmarks-plot-rt-tv-crps-approx-1.png) | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Would be nice to have some explanation of why variational basyes failed for nearly all of the models in the decline phase/generally why they are failing
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
it would but I think this might be out of scope here (i.e its a new research/doc issue IMO)
Estimation in `{EpiNow2}` using the semi-mechanistic approaches (putting a prior on $R_t$) is often much slower than the non-mechanistic approach. The mechanistic model is slower because it models aspects of the processes and mechanisms that drive $R_t$ estimates using the renewal equation. The non-mechanistic model, on the other hand, runs much faster but does not use the renewal equation to generate infections. Because of this none of the options defining the behaviour of the reproduction number are available in this case, limiting its flexibility. | ||
|
||
It's worth noting that the non-mechanistic model in `{EpiNow2}` is equivalent to that used in the [`{EpiEstim}`](https://mrc-ide.github.io/EpiEstim/index.html) R package as they both estimate $R_t$ from case time series and the generation interval distribution. | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Might be worth noting that its likely that this is more sensitive to GI misspecification (I'd assume?). Also I am guessing the difference is that EpiEstim would require specifying just the mean delay, where the non-mechanistic EpiNow2 is able to incorporate the full delay distribution, which is likely to be important for fitting to more delayed data
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could you clarify a bit what you mean bit by this there? The renewal?
Also I am guessing the difference is that EpiEstim would require specifying just the mean delay, where the non-mechanistic EpiNow2 is able to incorporate the full delay distribution, which is likely to be important for fitting to more delayed data
It doesn't take any delays at all and the approach to smoothing is different (here it is being fit). We should probably talk about this somewhere.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
My understanding of EpiEstim is that it does a direct computation of the posterior estimate of R(t) based on the cases and the GI, assuming cases are just latent incident infections shifted by some mean delay.
So when its written that the non-mechanistic one is like EpiEstim, I assume it does something similar, but since the input has the user specify a full delay distribution, I (perhaps incorrectly) assumed it is using that full distribution (rather than just the mean).
What do you mean it doesn't take any delays?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
EpiEstim doesnt do anything with delays and uses the serial interval not the GT. I think what the text means here is that underlying infection generating process Epinow2 and EpiEstim uses are the same but not the rest of the model and that is infact only partially true because of differences in the latent model and differences in the uncertainty. Personally I would just nuke this paragraph.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ok yeah I don't think I understand how the non-mechanistic model works then!
labs(caption = "Where a model is not shown, it means it failed to run") | ||
``` | ||
|
||
![plot of chunk plot-infections-total-crps-approx](benchmarks-plot-infections-total-crps-approx-1.png) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you also plot a comparison of the MCMC vs approximate methods for R(t) estimation and infections? There is a lot of language in here around the approximate methods not being well-tested, but this exercise seems like an evaluation of them! Would be good to show those results directly I think
infections_crps_mcmc_plot + | ||
facet_wrap(~epidemic_phase, ncol = 1) | ||
``` | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It isn't clear to me that all of these are being evaluated in both the calibration period and the forecast period, or just in the 7 day forecast period?
Doesn't this mix in-sample and out-of-sample evaluation? I am most interested in the out of sample evaluation performance (specifically for the infections)
Co-authored-by: Sebastian Funk <[email protected]>
Agreement from today's f2f meeting with Seb Scope To evaluate the retrospective and real-time performance of select models using timing and proper scoring rules Evaluations
I'll work on this gradually this week, if time permits as I will be attending an internal event this week. This should be completed by early next week, if all goes well. |
Description
This PR closes #629.
Initial submission checklist
devtools::test()
anddevtools::check()
).devtools::document()
).lintr::lint_package()
).After the initial Pull Request