-
Notifications
You must be signed in to change notification settings - Fork 31
Description
Hi @mjskay,
I noticed that the weighted quantile method used in the package has some strange behavior, as do similar weighted quantile functions in other R packages. The quantile estimate depends on the arbitrary order in which records in the data are sorted. Here's a reprex showing the results from ggdist as well as the packages 'survey' and 'collapse' (the latter of which I think is trying to do the same thing as ggdist here).
# Create example data, estimate quantile
data <- data.frame(
x = c(2 , 2 , 3 , 3 ),
w = c(0.25, 0.15, 0.35, 0.25)
)
ggdist::weighted_quantile(x = data$x, weights = data$w, probs = 0.5)
#> 50%
#> 2.640845
survey:::qrule_hf7(x = data$x, w = data$w, p = 0.5)
#> [1] 2.833333
collapse::fquantile(x = data$x, w = data$w, p = 0.5, type = 7)
#> 50%
#> 2.785714
# Sort the data differently, then estimate the quantile again
data2 <- data |> dplyr::arrange(x, w)
ggdist::weighted_quantile(x = data2$x, weights = data2$w, probs = 0.5)
#> 50%
#> 2.9
survey:::qrule_hf7(x = data2$x, w = data2$w, p = 0.5)
#> [1] 2.7
collapse::fquantile(x = data2$x, w = data2$w, p = 0.5, type = 7)
#> 50%
#> 2.9Created on 2025-11-05 with reprex v2.1.1
I put together a blog post to describe the underlying problem with these implementations and suggest a way forward to resolving them, based on some ideas from an earlier blog post of yours. I'd be curious to hear your thoughts on this issue and what you think might be the best ways to address the issue apparent in the reprex here.
https://www.practicalsignificance.com/posts/weighted-quantile-weirdness/