Skip to content

Inconsistencies with the weighted quantile function #267

@bschneidr

Description

@bschneidr

Hi @mjskay,

I noticed that the weighted quantile method used in the package has some strange behavior, as do similar weighted quantile functions in other R packages. The quantile estimate depends on the arbitrary order in which records in the data are sorted. Here's a reprex showing the results from ggdist as well as the packages 'survey' and 'collapse' (the latter of which I think is trying to do the same thing as ggdist here).

# Create example data, estimate quantile
  data <- data.frame(
    x = c(2   , 2  ,  3   , 3   ),
    w = c(0.25, 0.15, 0.35, 0.25)
  )
  
  ggdist::weighted_quantile(x = data$x, weights = data$w, probs = 0.5)
#>      50% 
#> 2.640845
  survey:::qrule_hf7(x = data$x, w = data$w, p = 0.5)
#> [1] 2.833333
  collapse::fquantile(x = data$x, w = data$w, p = 0.5, type = 7)
#>      50% 
#> 2.785714

# Sort the data differently, then estimate the quantile again
  data2 <- data |> dplyr::arrange(x, w)
  
  ggdist::weighted_quantile(x = data2$x, weights = data2$w, probs = 0.5)
#> 50% 
#> 2.9
  survey:::qrule_hf7(x = data2$x, w = data2$w, p = 0.5)
#> [1] 2.7
  collapse::fquantile(x = data2$x, w = data2$w, p = 0.5, type = 7)
#> 50% 
#> 2.9

Created on 2025-11-05 with reprex v2.1.1

I put together a blog post to describe the underlying problem with these implementations and suggest a way forward to resolving them, based on some ideas from an earlier blog post of yours. I'd be curious to hear your thoughts on this issue and what you think might be the best ways to address the issue apparent in the reprex here.

https://www.practicalsignificance.com/posts/weighted-quantile-weirdness/

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions