Add an article about fitting models with seroreversion

In #200 @ben18785 suggested we should add an article about fitting models with seroreversion using the example wrote in the PR description: 

> To illustrate the current pipeline, consider a disease with constant FOI λ = 0.02 and seroreversion rate μ = 0.01 :
> 
> ```r
> foi <- data.frame(
>   year = 1951:2000,
>   foi = rep(0.01, 50)
> )
> seroreversion_rate <- 0.01
> ```
> 
> and a serological survey with the following features:
> 
> ```r
> survey_features <- data.frame(
>   age_min = 1:50,
>   age_max = 1:50,
>   sample_size = runif(0,100)
> )
> 
> $ age_min     <int> 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26,…
> $ age_max     <int> 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26,…
> $ sample_size <dbl> 51, 35, 14, 86, 70, 34, 50, 98, 51, 93, 78, 25, 36, 40, 15, 43, 79, 96, 96, 64, 56, 31, 84, 45…
> ```
> 
> We can use `simulate_serosurvey()`, introduced in #199, to simulate statistically consistent seropositive counts `n_seropositive` for each age group, according to our set of serological models. In this case, I use the time-varying model:
> 
> ```r
> serosurvey <- simulate_serosurvey(
>   model = "time",
>   foi = foi,
>   survey_features = survey_features,
>   seroreversion_rate = seroreversion_rate
> ) %>% mutate(
>   survey_year = max(foi$year)
> )
> 
> $ age_min        <int> 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, …
> $ age_max        <int> 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, …
> $ sample_size    <dbl> 51, 35, 14, 86, 70, 34, 50, 98, 51, 93, 78, 25, 36, 40, 15, 43, 79, 96, 96, 64, 56, 31, 84,…
> $ n_seropositive <int> 4, 1, 2, 12, 14, 2, 9, 19, 12, 27, 18, 6, 14, 12, 5, 19, 28, 31, 45, 26, 25, 13, 37, 30, 42…
> $ survey_year     <int> 2000, 2000, 2000, 2000, 2000, 2000, 2000, 2000, 2000, 2000, 2000, 2000, 2000, 2000, 2000, 2…
> ```
> 
> which we can visualise using `plot_serosurvey`:
> 
> ```r
> plot_serosurvey(serosurvey = serosurvey)
> ```
> 
> ![image](https://github.com/user-attachments/assets/a1ee50fa-c447-4a2b-a8f7-64d7ae2212bc)
> 
> Implementing the constant model with and without seroreversion by means of `fit_seromodel()`:
> 
> ```r
> # initialization function for sampling
> init <- function() {
>   list(foi_vector = rep(0.1, nrow(survey_features)))
> }
> 
> # constant model without seroreversion
> seromodel_constant_no_serorev <- fit_seromodel(
>   serosurvey = serosurvey,
>   model = "constant",
>   foi_prior = sf_uniform(0, 10),
>   init = init
> )
> 
> # constant model with seroreversion
> seromodel_constant_serorev <- fit_seromodel(
>   serosurvey = serosurvey,
>   model = "constant",
>   foi_prior = sf_uniform(0, 10),
>   is_seroreversion = TRUE,
>   seroreversion_prior = sf_uniform(0, 1),
>   init = init
> )
> ```
> 
> Visualizing the results:
> 
> ```r
> plot_no_serorev <- plot_seromodel(
>   seromodel = seromodel_constant_no_serorev,
>   serosurvey = serosurvey
> )
> 
> plot_serorev <- plot_seromodel(
>   seromodel = seromodel_constant_serorev,
>   serosurvey = serosurvey
> )
> 
> cowplot::plot_grid(plot_no_serorev, plot_serorev)
> ```
> 
> ![image](https://github.com/user-attachments/assets/d9501d35-4525-4d89-9e6e-78c9b1ab52a1)
> 
> Note that the model estimating the seroreversion rate (right panel in the image), no only is a better fit for this serological survey according to the elpd value, but it also accurately estimates both the FOI and seroreversion rate we used to simulate the serosurvey.
> 
> Another difference with previous versions lies in how we visualize the estimated parameters. Since the constant model is estimating a single FOI value, we only show the estimated value with its corresponding credible interval (we use to plot it instead). However, for the time and age varying models we estimate several values of the FOI in the time/age span of the survey, which we visualize graphically. Take for instance the time varying model with seroreversion:
> 
> ```r
> # implementing the time-varying model with seroreversion
> seromodel_time_serorev <- fit_seromodel(
>   serosurvey = serosurvey,
>   model = "time",
>   foi_prior = sf_uniform(0, 10),
>   is_seroreversion = TRUE,
>   seroreversion_prior = sf_uniform(0, 1),
>   init = init,
>   iter = 4000
> )
> # plotting
> plot_time_serorev <- plot_seromodel(
>   seromodel = seromodel_time_serorev,
>   serosurvey = serosurvey
> )
> ```
> 
> ![image](https://github.com/user-attachments/assets/09ff7906-4fad-4347-b2e1-2509b9694ff5)
> 
> Here we plot the estimated values of the FOI as a function of time (blue line and shadow) and the R-hat estimates for each estimated value (black dots). Note that, even though we ran the model for a large number of iterations - 4000, the model did not converged (since there are R-hat values over 1.01). We can improve the convergence of the model by estimating less values of the FOI. To specify the time intervals for which FOI values will be estimated, we index them by means of `get_foi_index()`:
> 
> ```r
> foi_index <- get_foi_index(
>   serosurvey = serosurvey,
>   group_size = 5
>   )
> 
> > foi_index
>  [1]  1  1  1  1  1  2  2  2  2  2  3  3  3  3  3  4  4  4  4  4  5  5  5  5  5  6  6  6  6  6  7  7  7  7  7  8  8
> [38]  8  8  8  9  9  9  9  9 10 10 10 10 10
> ```
> 
> This function returns a list of indexes that labels each year (or age, for age-varying models). The idea is that a single FOI value will be estimated for each index, meaning that 10 FOI values will be estimated in this particular case:
> 
> ```r
> init <- function() {
>   list(foi_vector = rep(0.1, max(foi_index)))
> }
> 
> seromodel_time_serorev <- fit_seromodel(
>   serosurvey = serosurvey,
>   model = "time",
>   foi_prior = sf_uniform(0, 10),
>   is_seroreversion = TRUE,
>   seroreversion_prior = sf_uniform(0, 1),
>   foi_index = foi_index,
>   init = init,
>   iter = 4000
> )
> 
> plot_seromodel(
>   seromodel_time_serorev,
>   serosurvey = serosurvey
> )
> ```
> 
> ![image](https://github.com/user-attachments/assets/1783f2c9-3c56-4384-85b9-1e049230d531)
> 
> Now the model converges. As long as `foi_index` length covers the whole time/age span of the serological survey, less regular indexations `foi_index` can be specified.

This example shows how to use the seroreversion rate estimation features of the package using a simulated dataset as benchmark, which allows to validate the results.




Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Add an article about fitting models with seroreversion #205

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Add an article about fitting models with seroreversion #205

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions