-
Notifications
You must be signed in to change notification settings - Fork 4
Description
In #200 @ben18785 suggested we should add an article about fitting models with seroreversion using the example wrote in the PR description:
To illustrate the current pipeline, consider a disease with constant FOI λ = 0.02 and seroreversion rate μ = 0.01 :
foi <- data.frame( year = 1951:2000, foi = rep(0.01, 50) ) seroreversion_rate <- 0.01and a serological survey with the following features:
survey_features <- data.frame( age_min = 1:50, age_max = 1:50, sample_size = runif(0,100) ) $ age_min <int> 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26,… $ age_max <int> 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26,… $ sample_size <dbl> 51, 35, 14, 86, 70, 34, 50, 98, 51, 93, 78, 25, 36, 40, 15, 43, 79, 96, 96, 64, 56, 31, 84, 45…We can use
simulate_serosurvey()
, introduced in #199, to simulate statistically consistent seropositive countsn_seropositive
for each age group, according to our set of serological models. In this case, I use the time-varying model:serosurvey <- simulate_serosurvey( model = "time", foi = foi, survey_features = survey_features, seroreversion_rate = seroreversion_rate ) %>% mutate( survey_year = max(foi$year) ) $ age_min <int> 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, … $ age_max <int> 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, … $ sample_size <dbl> 51, 35, 14, 86, 70, 34, 50, 98, 51, 93, 78, 25, 36, 40, 15, 43, 79, 96, 96, 64, 56, 31, 84,… $ n_seropositive <int> 4, 1, 2, 12, 14, 2, 9, 19, 12, 27, 18, 6, 14, 12, 5, 19, 28, 31, 45, 26, 25, 13, 37, 30, 42… $ survey_year <int> 2000, 2000, 2000, 2000, 2000, 2000, 2000, 2000, 2000, 2000, 2000, 2000, 2000, 2000, 2000, 2…which we can visualise using
plot_serosurvey
:plot_serosurvey(serosurvey = serosurvey)Implementing the constant model with and without seroreversion by means of
fit_seromodel()
:# initialization function for sampling init <- function() { list(foi_vector = rep(0.1, nrow(survey_features))) } # constant model without seroreversion seromodel_constant_no_serorev <- fit_seromodel( serosurvey = serosurvey, model = "constant", foi_prior = sf_uniform(0, 10), init = init ) # constant model with seroreversion seromodel_constant_serorev <- fit_seromodel( serosurvey = serosurvey, model = "constant", foi_prior = sf_uniform(0, 10), is_seroreversion = TRUE, seroreversion_prior = sf_uniform(0, 1), init = init )Visualizing the results:
plot_no_serorev <- plot_seromodel( seromodel = seromodel_constant_no_serorev, serosurvey = serosurvey ) plot_serorev <- plot_seromodel( seromodel = seromodel_constant_serorev, serosurvey = serosurvey ) cowplot::plot_grid(plot_no_serorev, plot_serorev)Note that the model estimating the seroreversion rate (right panel in the image), no only is a better fit for this serological survey according to the elpd value, but it also accurately estimates both the FOI and seroreversion rate we used to simulate the serosurvey.
Another difference with previous versions lies in how we visualize the estimated parameters. Since the constant model is estimating a single FOI value, we only show the estimated value with its corresponding credible interval (we use to plot it instead). However, for the time and age varying models we estimate several values of the FOI in the time/age span of the survey, which we visualize graphically. Take for instance the time varying model with seroreversion:
# implementing the time-varying model with seroreversion seromodel_time_serorev <- fit_seromodel( serosurvey = serosurvey, model = "time", foi_prior = sf_uniform(0, 10), is_seroreversion = TRUE, seroreversion_prior = sf_uniform(0, 1), init = init, iter = 4000 ) # plotting plot_time_serorev <- plot_seromodel( seromodel = seromodel_time_serorev, serosurvey = serosurvey )Here we plot the estimated values of the FOI as a function of time (blue line and shadow) and the R-hat estimates for each estimated value (black dots). Note that, even though we ran the model for a large number of iterations - 4000, the model did not converged (since there are R-hat values over 1.01). We can improve the convergence of the model by estimating less values of the FOI. To specify the time intervals for which FOI values will be estimated, we index them by means of
get_foi_index()
:foi_index <- get_foi_index( serosurvey = serosurvey, group_size = 5 ) > foi_index [1] 1 1 1 1 1 2 2 2 2 2 3 3 3 3 3 4 4 4 4 4 5 5 5 5 5 6 6 6 6 6 7 7 7 7 7 8 8 [38] 8 8 8 9 9 9 9 9 10 10 10 10 10This function returns a list of indexes that labels each year (or age, for age-varying models). The idea is that a single FOI value will be estimated for each index, meaning that 10 FOI values will be estimated in this particular case:
init <- function() { list(foi_vector = rep(0.1, max(foi_index))) } seromodel_time_serorev <- fit_seromodel( serosurvey = serosurvey, model = "time", foi_prior = sf_uniform(0, 10), is_seroreversion = TRUE, seroreversion_prior = sf_uniform(0, 1), foi_index = foi_index, init = init, iter = 4000 ) plot_seromodel( seromodel_time_serorev, serosurvey = serosurvey )Now the model converges. As long as
foi_index
length covers the whole time/age span of the serological survey, less regular indexationsfoi_index
can be specified.
This example shows how to use the seroreversion rate estimation features of the package using a simulated dataset as benchmark, which allows to validate the results.