Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix prevalence functions to return DEFF when survey weights is NULL #86

Open
tomaszaba opened this issue Nov 8, 2024 · 3 comments
Open
Assignees
Labels
analysis bug Something isn't working data recoding tasks related to recoding of variables refactor code improvement/refactoring

Comments

@tomaszaba
Copy link
Collaborator

No description provided.

@tomaszaba tomaszaba added the bug Something isn't working label Nov 8, 2024
@tomaszaba tomaszaba added this to the 3. Re-factor functions milestone Nov 8, 2024
@ernestguevarra
Copy link
Member

@tomaszaba, in the survey package (which is wrapped by your preferred srvyr package), weights argument is by default NULL which would indicate that the design is PPS. If so, then you will need to supply either probs or fpc argument. probs is likely what you would want (or would have data for). Probability of selection in a two-stage cluster sampling design is:

$$ prob_{overall} = prob_{cluster} \times prob_{individual} $$

where

$$ prob_{cluster} = \frac{pop_{cluster} \times pop_{total}}{n_{cluster}} $$

and

$$ prob_{individual} = \frac{n_{\text{sample per cluster}}}{pop_{cluster}} $$

so, you either add the cluster population size and total population to the example dataset that you have (and add a function that will calculate the probabilities) or add the probability variable in your dataset. If your sample has the weights, then since weights are the inverse of the probability, you can just calculate it based on that.

$$ \text{weights} = \frac{1}{prob_{overall}} $$

$$ prob_{overall} = \frac{1}{\text{weights}} $$

@ernestguevarra
Copy link
Member

ernestguevarra commented Dec 4, 2024

I was reviewing your weighting functions and in your example, you use the anthro.02 dataset which you described as:

A household budget survey data conducted in Mozambique in 2019/2020, known as IOF (Inquérito ao Orçamento Familiar in Portuguese). IOF is a two-stage cluster-based survey, representative at province level (admin 2), with probability of the selection of the clusters proportional to the size of the population. Its data collection spans for a period of 12 months.

It seems to me that the wtfactor there in that dataset is weights for the cluster to be selected, not the individuals. But since this is a two-stage cluster survey with each row of data is probably a household, then you need the probability of a household being selected within a cluster and then get the overall weight for selecting a sample (which is what you are trying to do with the anthro data). But in your example, you just used the wtfactor which is just the weights for selecting a cluster. It might work as an example but I think the specification of the sampling design in your example is not appropriate for the metric you are measuring.

You are getting into very murky waters here with your inclusion of complex survey design analysis in your package. Not all two-stage cluster-based surveys are the same sampling design and calculations of weights can vary...be careful here.

@ernestguevarra ernestguevarra added analysis data recoding tasks related to recoding of variables refactor code improvement/refactoring labels Dec 4, 2024
@tomaszaba
Copy link
Collaborator Author

Hi Ernest, on this, related to the message that I posted a few minutes ago in this issue #87, Douglas will also ask the ENA developer to explain how do they calculate the DEFF in ENA in the absence of the survey weights. I asked him to ask if the code for this task could be shared, so we can translate it into the R language and ensure consistency.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
analysis bug Something isn't working data recoding tasks related to recoding of variables refactor code improvement/refactoring
Projects
None yet
Development

No branches or pull requests

2 participants