Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Support simultaneous stacking and dodging by different variables in geom_col #6324

Open
bakaburg1 opened this issue Feb 6, 2025 · 4 comments
Labels
feature a feature request or enhancement positions 🥇

Comments

@bakaburg1
Copy link

I'd like to propose adding support for simultaneous stacking and dodging controlled by different variables in geom_col. Currently, this common visualization need requires workarounds that are both verbose and harder to maintain.

Current Limitation

When using geom_col, we can either stack or dodge bars based on a grouping variable, but not both at the same time using different variables. This makes it difficult to create visualizations where we want to:

  1. Stack bars by one categorical variable
  2. Dodge the resulting stacks by another categorical variable

Here's a reprex with counts from surveillance data stratified by year, country and surveillance protocol

Minimal Reproducible Example

library(ggplot2)
library(dplyr)

# Sample data
df <- bind_rows(
    data.frame(
        year = rep(2016, 5),
        protocol = rep("M", 5),
        country = c("A", "B", "C", "D", "E"),
        freq = c(100, 50, 30, 40, 11)
    ),
    data.frame(
        year = rep(2016, 4),
        protocol = rep("L", 4),
        country = c("A", "B", "C", "D"),
        freq = c(23, 60, 200, 100)
    )
)

# Current workaround requires multiple geom_col calls
ggplot() +
    geom_col(
        data = df %>% filter(protocol == "M"),
        aes(x = year - .5, y = freq,
            fill = protocol, group = country),
        position = "stack",
        width = 0.4
    ) +
    geom_col(
        data = df %>% filter(protocol == "L"),
        aes(x = year + .5, y = freq,
            fill = protocol, group = country),
        position = "stack",
        width = 0.4
    )

Desired Behavior

Ideally, we would be able to specify both stacking and dodging variables in a single geom_col call, something like:

# Conceptual syntax (not working)
ggplot(df, aes(x = year, y = freq)) +
    geom_col(
        aes(fill = protocol, group = country),
        position = position_stackdodge(
            stack_by = "country",
            dodge_by = "protocol"
        )
    )

Use Cases

This functionality would be particularly useful for:

  • Comparing distributions across multiple categories
  • Visualizing nested hierarchical data
  • Creating more complex compositional charts without resorting to hacky solutions
  • Maintaining consistent spacing and positioning without manual x-axis adjustments

Benefits

  1. More intuitive API for common visualization needs
  2. Reduced code complexity
  3. Better maintainability
  4. Consistent positioning and spacing handled by ggplot2
  5. Easier integration with scales and themes
@teunbrand
Copy link
Collaborator

teunbrand commented Feb 6, 2025

Thanks for the report! This request is similar to #2267, which was closed as unplanned.
I think one reason we've been reluctant to implement this is because it would break the API as position adjustments do not have the right authority to include variables (like stack_by and dodge_by) from the data.
However, because we implemented #6100, I think this limitation no longer holds and this suggestion no longer would break the API.
For these reasons, I think this should be possible, but I'm not yet convinced that it belongs to ggplot2 and not an extension package.

@teunbrand teunbrand added positions 🥇 feature a feature request or enhancement labels Feb 6, 2025
@bakaburg1
Copy link
Author

Thank you!

In the meantime (with great help of various AIs) I developed an ad hoc geom. I still think that a position_ function is more appropriate since it could accommodate other geoms too (and I don't like the idea of a geom just for positioning) but I wasn't able to make one. Regarding whether to put it ggplot or not I would advise for the first solution. I was very surprised in the first place this was not possible already, it's something one would expect out of the box!

GeomStackDodgeCol <- ggproto(
    "GeomStackDodgeCol", GeomRect,
    required_aes = c("x", "y", "fill", "group"),
    default_aes = aes(
        colour = "black",
        linewidth = 0.5,
        linetype = 1,
        alpha = NA
    ),
    
    setup_data = function(data, params) {
        # Reset stacking for each x value and fill group
        data <- data |>
            group_by(x, fill) |>
            mutate(
                ymin = c(0, head(cumsum(y), -1)),
                ymax = cumsum(y)
            ) |>
            ungroup()
        
        # Compute dodging offsets with width and padding
        fill_groups <- unique(data$fill)
        n_groups <- length(fill_groups)
        width <- params$width %||% 0.9     # width of the bars
        padding <- params$padding %||% 0.1  # padding between bars
        
        # Calculate total width needed for the group
        total_width <- n_groups * width + (n_groups - 1) * padding * width
        
        # Calculate positions with proper spacing
        positions <- seq(-total_width/2, total_width/2, length.out = n_groups)
        
        # Create rectangle coordinates
        data$xmin <- data$x + positions[match(data$fill, fill_groups)] - width/2
        data$xmax <- data$x + positions[match(data$fill, fill_groups)] + width/2
        
        data
    },
    
    draw_panel = function(data, panel_params, coord, width = 0.9, ...) {
        coords <- coord$transform(data, panel_params)
        
        grid::rectGrob(
            x = (coords$xmin + coords$xmax)/2,
            y = (coords$ymin + coords$ymax)/2,
            width = coords$xmax - coords$xmin,
            height = coords$ymax - coords$ymin,
            default.units = "native",
            just = c("center", "center"),
            gp = grid::gpar(
                col = coords$colour,
                fill = alpha(coords$fill, coords$alpha),
                lwd = coords$linewidth * .pt,
                lty = coords$linetype
            )
        )
    },
    
    parameters = function(complete = FALSE) {
        c("na.rm", "width", "padding")
    }
)

geom_stackdodge_col <- function(mapping = NULL, data = NULL,
                            position = "identity", 
                            width = 0.9,
                            padding = 0.1,
                            na.rm = FALSE,
                            show.legend = NA,
                            inherit.aes = TRUE, ...) {
    layer(
        geom = GeomStackDodgeCol,
        mapping = mapping,
        data = data,
        stat = "identity",
        position = position,
        show.legend = show.legend,
        inherit.aes = inherit.aes,
        params = list(
            na.rm = na.rm,
            width = width,
            padding = padding
        )
    )
}

of course testing is mandated.

Here's some testing code:

local({
    df <- bind_rows(
        data.frame(
            year = rep(2016, 5),
            protocol = rep("M", 5),
            country = c("A", "B", "C", "D", "E"),
            freq = c(100, 50, 30, 40, 11) # sum is 231
        ),
        data.frame(
            year = rep(2016, 4),
            protocol = rep("L", 4),
            country = c("A", "B", "C", "D"),
            freq = c(23, 60, 200, 100) # sum is 383
        )
    )
    
   # Add more years
    df <- bind_rows(
        df,
        df |> mutate(year = 2017, freq = sample(freq)),
    )
    
    # Create summary data
    df_sum <- df |>
        summarise(
            label = paste(country, collapse = "\n"),
            freq = sum(freq),
            .by = c(year, protocol)
        )
    ggplot() +
        geom_stackdodge_col(
            data = df,
            aes(x = factor(year), y = freq, group = country,
                fill = protocol),
            width = 0.1, padding = 0.5
        ) +
        geom_hline(yintercept = c(sum(c(100, 50, 30, 40, 11), sum(c(23, 60, 200, 100) )) # To show that the bars sum up to the expected values
})

image|690x379

@clauswilke
Copy link
Member

Regarding whether to put it ggplot or not I would advise for the first solution. I was very surprised in the first place this was not possible already, it's something one would expect out of the box!

We have for many years now followed the philosophy that only the absolute core features are in ggplot2 itself and other, less commonly used features should go into extension packages. Maybe this would be a good fit for ggforce for example.

Also, while I'm of the opinion that everybody should be allowed and empowered to make any visualization they want, I find it difficult to think of a valid use case for this geom. I've never in my life thought "hm, I want to stack and dodge at the same time." This is definitely an obscure corner case, and I feel reasonably confident that any figure you make with this feature can be improved by removing one of the two position adjustments.

@bakaburg1
Copy link
Author

Uhm, it's a pretty common scenario in epidemiology!

Should I cross post it to ggforce? Do they work also on position functions or only on geoms?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
feature a feature request or enhancement positions 🥇
Projects
None yet
Development

No branches or pull requests

3 participants