Fix prevalence functions to return DEFF when survey weights is NULL #86

tomaszaba · 2024-11-08T21:08:13Z

No description provided.

ernestguevarra · 2024-12-04T05:52:35Z

@tomaszaba, in the survey package (which is wrapped by your preferred srvyr package), weights argument is by default NULL which would indicate that the design is PPS. If so, then you will need to supply either probs or fpc argument. probs is likely what you would want (or would have data for). Probability of selection in a two-stage cluster sampling design is:

$$ prob_{overall} = prob_{cluster} \times prob_{individual} $$

where

$$ prob_{cluster} = \frac{pop_{cluster} \times pop_{total}}{n_{cluster}} $$

and

$$ prob_{individual} = \frac{n_{\text{sample per cluster}}}{pop_{cluster}} $$

so, you either add the cluster population size and total population to the example dataset that you have (and add a function that will calculate the probabilities) or add the probability variable in your dataset. If your sample has the weights, then since weights are the inverse of the probability, you can just calculate it based on that.

$$ \text{weights} = \frac{1}{prob_{overall}} $$

$$ prob_{overall} = \frac{1}{\text{weights}} $$

ernestguevarra · 2024-12-04T06:08:59Z

I was reviewing your weighting functions and in your example, you use the anthro.02 dataset which you described as:

A household budget survey data conducted in Mozambique in 2019/2020, known as IOF (Inquérito ao Orçamento Familiar in Portuguese). IOF is a two-stage cluster-based survey, representative at province level (admin 2), with probability of the selection of the clusters proportional to the size of the population. Its data collection spans for a period of 12 months.

It seems to me that the wtfactor there in that dataset is weights for the cluster to be selected, not the individuals. But since this is a two-stage cluster survey with each row of data is probably a household, then you need the probability of a household being selected within a cluster and then get the overall weight for selecting a sample (which is what you are trying to do with the anthro data). But in your example, you just used the wtfactor which is just the weights for selecting a cluster. It might work as an example but I think the specification of the sampling design in your example is not appropriate for the metric you are measuring.

You are getting into very murky waters here with your inclusion of complex survey design analysis in your package. Not all two-stage cluster-based surveys are the same sampling design and calculations of weights can vary...be careful here.

tomaszaba · 2024-12-13T22:25:58Z

Hi Ernest, on this, related to the message that I posted a few minutes ago in this issue #87, Douglas will also ask the ENA developer to explain how do they calculate the DEFF in ENA in the absence of the survey weights. I asked him to ask if the code for this task could be shared, so we can translate it into the R language and ensure consistency.

tomaszaba added the bug Something isn't working label Nov 8, 2024

tomaszaba added this to the 3. Re-factor functions milestone Nov 8, 2024

tomaszaba assigned ernestguevarra and tomaszaba Nov 8, 2024

ernestguevarra added analysis data recoding tasks related to recoding of variables refactor code improvement/refactoring labels Dec 4, 2024

ernestguevarra modified the milestones: 3. Re-factor functions, 5. GitHub release Feb 6, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Fix prevalence functions to return DEFF when survey weights is NULL #86

Fix prevalence functions to return DEFF when survey weights is NULL #86

tomaszaba commented Nov 8, 2024

ernestguevarra commented Dec 4, 2024

ernestguevarra commented Dec 4, 2024 •

edited

Loading

tomaszaba commented Dec 13, 2024

Fix prevalence functions to return DEFF when survey weights is NULL #86

Fix prevalence functions to return DEFF when survey weights is NULL #86

Comments

tomaszaba commented Nov 8, 2024

ernestguevarra commented Dec 4, 2024

ernestguevarra commented Dec 4, 2024 • edited Loading

tomaszaba commented Dec 13, 2024

ernestguevarra commented Dec 4, 2024 •

edited

Loading