ggplot2 AsIs sometimes not preserved when computing geom parameters

At least in GeomBar (and possibly elsewhere), when an <AsIs> aesthetic for bar height is parameterized, the <AsIs> class is sometimes dropped.

library(ggplot2)

p <- data.frame(x = 1:2, y = c(0.5, 2)) |>
  ggplot(aes(x, I(y))) +
  geom_col()
p

Here, ymax is stripped of <AsIs>:

tibble::as_tibble(
  layer_data(p)[, c("ymin", "y", "ymax")]
)
#> # A tibble: 2 × 3
#>       ymin        y  ymax
#>   <I<dbl>> <I<dbl>> <dbl>
#> 1        0      0.5   0.5
#> 2        0      2     2

This also has consequences for training of the scales.[^1] In the plot, the height of the shortest bar is determined only by expansion

ggplot_build(p)$layout$get_scales(1)$y
#> <ScaleContinuousPosition>
#>  Range:   0.5 --    2
#>  Limits:  0.5 --    2

Expected plot of bar heights rendered in npc:

[^1]: I'm actually torn on whether this part is also a bug. Technically, the bars are following the baseline-at-0 constraint (just that 0 is now interpreted in npc, which is meaningless). But maybe in this case GeomBar should override the baseline of the bars to always be on data scale (probably hard)? Or if users really want that for whatever reason, they could just add ylim(0, NA) I suppose

Apr 25 '24 04:04 yjunechoe

Thanks June!

I think the following things are happening.

position_stack() is doing a computation on ymax that drops the <AsIs> class. The data come out the correct way from GeomBar$setup_data and position = "identity" preserves the <AsIs> class.
The scales are supposed to ignore <AsIs> class vectors, as that is the main mechanism through which this works. The y scale gets trained to c(0.5, 2) because it only observes the plain ymax returned from the position adjustment. I don't think the scale range should include 0 in this case, as y variables are <AsIs>. Theoretically, the y scale shouldn't even be populated in this case as it is supposed to ignore all y variables.

The following is what I think the plot should yield. The clipping/margins are added for clarity.

library(ggplot2)

data.frame(x = 1:2, y = c(0.5, 2)) |>
  ggplot(aes(x, I(y))) +
  geom_col(position = "identity") +
  coord_cartesian(clip = "off") +
  theme(plot.margin = margin(200, 5, 5, 5))

Note that the y-scale is unpopulated because it hasn't observed any of the <AsIs> variables:

layer_scales()$y$is_empty()
#> [1] TRUE

^{Created on 2024-04-25 with reprex v2.1.0}

I think all of this brings us to the following question: should position adjustments attempt to preserve any <AsIs> variables? While I'm still undecided, I'm leaning towards 'no' as these are designed to operate in data-space and mixing data-space and panel-space in these computations is prone to unexpected results (better to not promise anything, than promising and not delivering).

Apr 25 '24 07:04 teunbrand

better to not promise anything, than promising and not delivering

Well put - that's the conclusion that I'm circling back to as well. Maybe once I() gets more widely used people will start developing stronger intuitions about what they expect here, but as it stands I'm now less sure about the "expected output" I posed originally.

Maybe a better way to frame the issue is whether ggplot should signal any infos or warnings if the user accidentally mixes data-space and panel-space? Because my surprise with the reprex is more so the fact that's not obvious from the user's side that they're mixing data-space and panel-space - the code reads like it should plot the y only in panel-space (setting aside the issue of whatever that should mean for GeomBar) but Position introduces data-space positioning internally and causes the mixing.

Apr 25 '24 13:04 yjunechoe

Yeah I agree that such warnings would be nice, but to my estimation there are a lot of places where aesthetics are combined into new ones which would mean a lot of checks scattered around the codebase. In the case of position adjustments specifically, it isn't standardised somewhere what aesthetics they read or write so it'd be hard to do systematically. Perhaps the least intrusive way out is simply to document the use of I() as 'at your own risk' and point out potential interactions with stats and position adjustments that may go unexpected.

Apr 25 '24 19:04 teunbrand

Got it - that sounds completely reasonable! I'm content with just the fact that this is clarified for me - I'll let you make the call for whether this also warrants an entry in the docs (and feel free to close this as complete).

Apr 25 '24 19:04 yjunechoe