Clarification on use of `weight` argument in `as_epidist_marginal_model` #512

athowes · 2025-01-28T11:59:24Z

PR #509 adds support for users providing a column which contains counts / weights. Docs are:

A column name to use for weighting the data in the likelihood. Default is NULL. Internally this is used to define the 'n' column of the returned object.

Compatible with epidist_linelist_data (which doesn't naturally support weighted linelist / aggregate, hence I suppose #508).

Some notes:

We had back and forth on design suggesting that aggregation needed to be internal to the marginal model. I guess that isn't a blocker because aggregation still occurs in the marginal model, but there is no harm in having this additional aggregatation. Essentially it's the user saying "all this data is the same" which can never be non-optimal (unless they are wrong). This then gets further aggregated up if it's possible to.
Likely want examples demonstrating this feature e.g. in vignette

So I guess scope of this issue could add docs to note that the marginal model will perform further aggregation if it is possible to given the formula etc. that you specify. Or that could be best left to longer form vignette.

The text was updated successfully, but these errors were encountered:

athowes · 2025-01-28T12:06:16Z

I think for the docs rather than only refer to "weighting" I might mention that this would usually be counts of a particular linelist item (i.e. connecting it up to the data rather than abstractly it's a weighted likelihood -- I'm unsure under what circumstance someone would be weighting in a way unconnected to counts of an observation).

kgostic · 2025-01-29T14:30:46Z

I agree the documentation isn't super clear, but I think it can be polished up with some edits to the details, and an example. I don't know that a full vignette is needed.

athowes · 2025-01-29T14:33:11Z

Agree that a full vignette is not needed for this alone. That said, I do think we should have a vignette / paper about the marginal model (and its differences as compared with the latent model) and it'd naturally be included there about how to use this argument.

kgostic · 2025-01-29T15:48:39Z

I'm going to take a stab at a good-enough fix now.

seabbs · 2025-02-12T14:07:31Z

See the updated documentation on main. This may or may not resolve the issue for you.

athowes · 2025-02-12T14:56:18Z

Agree makes progress and don't mind about closing.

IMO still could have mention of it in weight here:

Linking to concepts like:

The result is a more compact representation of the same data where each row represents multiple identical observations with the count stored in the n column.

seabbs · 2025-02-12T14:58:18Z

Could be useful to include the description here which I think has the context?

athowes · 2025-02-12T14:58:44Z

FWIW find things like the below to be quite visually cluttered. I think it's going to be confusing for users. I agree with there being downsides on other approaches too, just to say it's a lot.

seabbs · 2025-02-12T14:59:37Z

I think that probably requires it own issue for more structure in the pkgdown yaml which might help

athowes · 2025-02-12T15:03:56Z

Could be useful to include the description here which I think has the context?

The description is:

This method converts linelist data to a marginal model format by calculating delays between primary and secondary events, along with observation times and censoring windows. The likelihood used is imported from the primarycensored package which handles censoring in both primary and secondary events as well as truncation due to observation times. In principle, this method should be more accurate and more computationally efficient than the latent model (as_epidist_latent_model()) approach in most setting except when the number of unique strata approaches the number of observations.

Maybe this helps a bit. Feels like you would want to directly say the argument weight and how it fits in. Might reword "In principle, this method should be more accurate and more computationally efficient than the latent model" (this is a method to prepare data, feels like a type error to call it efficient, it's the model that is efficient or not).

seabbs mentioned this issue Jan 28, 2025

Add aggregate data support #412

Closed

seabbs added the documentation Improvements or additions to documentation label Jan 28, 2025

kgostic self-assigned this Jan 29, 2025

kgostic linked a pull request Jan 29, 2025 that will close this issue

update marginal model docs #515

Draft

9 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Clarification on use of `weight` argument in `as_epidist_marginal_model` #512

Clarification on use of `weight` argument in `as_epidist_marginal_model` #512

athowes commented Jan 28, 2025

athowes commented Jan 28, 2025

kgostic commented Jan 29, 2025

athowes commented Jan 29, 2025

kgostic commented Jan 29, 2025

seabbs commented Feb 12, 2025

athowes commented Feb 12, 2025

seabbs commented Feb 12, 2025 •

edited

Loading

athowes commented Feb 12, 2025

seabbs commented Feb 12, 2025

athowes commented Feb 12, 2025

Clarification on use of weight argument in as_epidist_marginal_model #512

Clarification on use of weight argument in as_epidist_marginal_model #512

Comments

athowes commented Jan 28, 2025

athowes commented Jan 28, 2025

kgostic commented Jan 29, 2025

athowes commented Jan 29, 2025

kgostic commented Jan 29, 2025

seabbs commented Feb 12, 2025

athowes commented Feb 12, 2025

seabbs commented Feb 12, 2025 • edited Loading

athowes commented Feb 12, 2025

seabbs commented Feb 12, 2025

athowes commented Feb 12, 2025

Clarification on use of `weight` argument in `as_epidist_marginal_model` #512

Clarification on use of `weight` argument in `as_epidist_marginal_model` #512

seabbs commented Feb 12, 2025 •

edited

Loading