Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Clarification on use of weight argument in as_epidist_marginal_model #512

Open
athowes opened this issue Jan 28, 2025 · 10 comments · May be fixed by #515
Open

Clarification on use of weight argument in as_epidist_marginal_model #512

athowes opened this issue Jan 28, 2025 · 10 comments · May be fixed by #515
Assignees
Labels
documentation Improvements or additions to documentation

Comments

@athowes
Copy link
Collaborator

athowes commented Jan 28, 2025

PR #509 adds support for users providing a column which contains counts / weights. Docs are:

A column name to use for weighting the data in the likelihood. Default is NULL. Internally this is used to define the 'n' column of the returned object.

Compatible with epidist_linelist_data (which doesn't naturally support weighted linelist / aggregate, hence I suppose #508).

Some notes:

  • We had back and forth on design suggesting that aggregation needed to be internal to the marginal model. I guess that isn't a blocker because aggregation still occurs in the marginal model, but there is no harm in having this additional aggregatation. Essentially it's the user saying "all this data is the same" which can never be non-optimal (unless they are wrong). This then gets further aggregated up if it's possible to.
  • Likely want examples demonstrating this feature e.g. in vignette

So I guess scope of this issue could add docs to note that the marginal model will perform further aggregation if it is possible to given the formula etc. that you specify. Or that could be best left to longer form vignette.

@athowes
Copy link
Collaborator Author

athowes commented Jan 28, 2025

I think for the docs rather than only refer to "weighting" I might mention that this would usually be counts of a particular linelist item (i.e. connecting it up to the data rather than abstractly it's a weighted likelihood -- I'm unsure under what circumstance someone would be weighting in a way unconnected to counts of an observation).

@seabbs seabbs added the documentation Improvements or additions to documentation label Jan 28, 2025
@kgostic
Copy link
Collaborator

kgostic commented Jan 29, 2025

I agree the documentation isn't super clear, but I think it can be polished up with some edits to the details, and an example. I don't know that a full vignette is needed.

@athowes
Copy link
Collaborator Author

athowes commented Jan 29, 2025

Agree that a full vignette is not needed for this alone. That said, I do think we should have a vignette / paper about the marginal model (and its differences as compared with the latent model) and it'd naturally be included there about how to use this argument.

@kgostic
Copy link
Collaborator

kgostic commented Jan 29, 2025

I'm going to take a stab at a good-enough fix now.

@kgostic kgostic self-assigned this Jan 29, 2025
@kgostic kgostic linked a pull request Jan 29, 2025 that will close this issue
9 tasks
@seabbs
Copy link
Contributor

seabbs commented Feb 12, 2025

See the updated documentation on main. This may or may not resolve the issue for you.

@athowes
Copy link
Collaborator Author

athowes commented Feb 12, 2025

Agree makes progress and don't mind about closing.

IMO still could have mention of it in weight here:

Image

Linking to concepts like:

The result is a more compact representation of the same data where each row represents multiple identical observations with the count stored in the n column.

@seabbs
Copy link
Contributor

seabbs commented Feb 12, 2025

Could be useful to include the description here which I think has the context?

@athowes
Copy link
Collaborator Author

athowes commented Feb 12, 2025

FWIW find things like the below to be quite visually cluttered. I think it's going to be confusing for users. I agree with there being downsides on other approaches too, just to say it's a lot.

Image

@seabbs
Copy link
Contributor

seabbs commented Feb 12, 2025

I think that probably requires it own issue for more structure in the pkgdown yaml which might help

@athowes
Copy link
Collaborator Author

athowes commented Feb 12, 2025

Could be useful to include the description here which I think has the context?

The description is:

This method converts linelist data to a marginal model format by calculating delays between primary and secondary events, along with observation times and censoring windows. The likelihood used is imported from the primarycensored package which handles censoring in both primary and secondary events as well as truncation due to observation times. In principle, this method should be more accurate and more computationally efficient than the latent model (as_epidist_latent_model()) approach in most setting except when the number of unique strata approaches the number of observations.

Maybe this helps a bit. Feels like you would want to directly say the argument weight and how it fits in. Might reword "In principle, this method should be more accurate and more computationally efficient than the latent model" (this is a method to prepare data, feels like a type error to call it efficient, it's the model that is efficient or not).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
documentation Improvements or additions to documentation
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants