ggplot2 icon indicating copy to clipboard operation
ggplot2 copied to clipboard

scale_shape_manual mislabels if label argument is given and names in values are not alphabetical

Open Istalan opened this issue 2 years ago • 4 comments

I found a problem in scale_(shape)_manual, presumably others as well, where when the alphabetical sorting of values happens automatically the labels argument does not get sorted. See my example below, where A, B, C have the shapes i want but in the legend As get labeled as Cs and Cs get labeled as As.

I expected that the values argument allow me to define which shapes i want for each of my string-values and the label argument would allow me to give more human readable versions of these strings and therefore any changes in the order of values gets applied to the labels as well. I know i can fix this, by either respecting alphabetical order or defining the offending variables as factors/ordered but I'd prefer that i didn't have to if I don't care what order the legend is in.

Most importantly : This should certainly not happen without warning, as it makes the creation of wrongly labelled plots very easy.

Here is the code to reproduce the bug:

library(tidyverse)
#> Warning: Paket 'tidyverse' wurde unter R Version 4.2.2 erstellt
#> Warning: Paket 'ggplot2' wurde unter R Version 4.2.2 erstellt
#> Warning: Paket 'tibble' wurde unter R Version 4.2.2 erstellt
#> Warning: Paket 'tidyr' wurde unter R Version 4.2.2 erstellt
#> Warning: Paket 'readr' wurde unter R Version 4.2.2 erstellt
#> Warning: Paket 'purrr' wurde unter R Version 4.2.2 erstellt
#> Warning: Paket 'dplyr' wurde unter R Version 4.2.2 erstellt
#> Warning: Paket 'stringr' wurde unter R Version 4.2.2 erstellt
#> Warning: Paket 'forcats' wurde unter R Version 4.2.2 erstellt
n <- 10
df <- data.frame(x = rnorm(n), y = rnorm(n), z = sample(c("A", "B", "C"), replace = T, size = n))

ggplot(data = df, aes(x = x, y= y, shape = z)) +
  geom_point(size = 3) +
  geom_text(aes(label = z), hjust = 1, vjust = 1) +
  scale_shape_manual(values = c("C" = 1, "B" = 2, "A" = 3), 
                     label = c("long Text about C", "long Text about  B", "long Text about  A")
                     )

Created on 2023-02-27 with reprex v2.0.2

Istalan avatar Feb 27 '23 16:02 Istalan

I think setting labels without specifying breaks is always a risky move. Perhaps the confusing bit is that you gave the values argument in a particular order, and the breaks/labels don't follow that order. Therefore, I don't think this is a bug.

However, we might be able to better by deriving the 'breaks' argument from the names(values) if appropriate.

teunbrand avatar Feb 27 '23 17:02 teunbrand

@teunbrand I think it's risky to add more heuristics that try to infer break order. It'll lead to even more confusion, and possibly breaking plots when you change the ordering of breaks.

A better approach might be to simply issue a warning when people specify labels but not breaks. I have long operated under the rule of thumb that one should never specify labels without specifying breaks. We could just formalize that rule. Maybe a more nuanced way of stating it is one should not specify labels via a vector of specific values without specifying breaks. Providing a formatting function is fine of course.

clauswilke avatar Feb 27 '23 19:02 clauswilke

Agreed, that might be a wiser decision. It should be relatively straightforward to adjust ggplot2:::check_breaks_labels() I presume.

teunbrand avatar Feb 27 '23 19:02 teunbrand

I've tried throwing a warning when labels are atomic and breaks is missing, but it'll throw quite a lot of warnings in test code that is fine in principle. The most common example is that you're using a discrete scale for a factor where you know the level order, which makes a warning about not specifying breaks seem a little bit befuddling. We might just encourage specifying breaks in the documentation though.

teunbrand avatar Apr 15 '24 08:04 teunbrand