Wishlist: Support factors as x or y
Thanks:
Grant @grantmcdermott, the project looks really nice and useful, thanks!
Wishlist:
I just wanted to put on the wishlist that in addition to numeric x and y, it would be great if the factor-based plot() flavors were also supported in plot2(). Especially the plots where y is a factor are in my opinion underappreciated in base R. (It's not surprising that I would say that because I implemented these...)
Plots:
In plot(x, y, ...) and plot(y ~ x, ...) we have the following:
xNumeric |
Factor |
|
|---|---|---|
y Numeric |
Scatterplot | Parallel boxplots |
| Factor | Spinogram (Spineplot with histogram-style breaks in x) |
Spineplot (Flavor of mosaic plot) |
Examples:
data("SwissLabor", package = "AER")
plot(income ~ age, data = SwissLabor) ## Numeric ~ Numeric
plot(income ~ foreign, data = SwissLabor) ## Numeric ~ Factor
plot(participation ~ age, data = SwissLabor) ## Factor ~ Numeric
plot(participation ~ foreign, data = SwissLabor) ## Factor ~ Factor
It would be great if the plots above would work with plot2() as well - and if the coloring and grouping etc. would also be supported.
GM edit: adding checklist to track
- [x] Numeric ~ Numeric
- [x] Numeric ~ Factor (#154)
- [ ] Factor ~ Numeric
- [ ] Factor ~ Factor
Thanks Achim.
I can't promise I'll get to this soon, but I just invited you to the project. No pressure (I'm well aware of how many other plates you have in the air...), but think of it as in invitation to take a stab at these if you get a chance ;-)
Thanks! I'll try...
Idea:
I've taken another look at the current state of the code and my idea to tackle support of factor variables and also of facets is the following:
- We set up internal functions that draw some type/class of
xagainst some type/class ofy, potentially also supporting abygrouping variable. - Multiple functions might be available for the same combination of
xandy, e.g.,cdplotvs.spineplotin case of a factoryand a numericx. - The
yvariable might also be missing so that we can get density or histogram functions. - These functions can be called just once, corresponding to what is currently done in
plot2.default, or several times in case of mfrow/facets. - The functions must have an argument
axes = TRUEthat can also be set toFALSEso that either axes are drawn or suppressed. But additionally specifications likeaxes = c(1, 2)oraxes = 2etc. should be possible so that only certain axes are drawn. The latter is needed if we don't want to repeat certain axes in facet displays.
Examples:
Functions would inlcude:
plot2_numeric_numeric_scatter(x, y, by = NULL, axes = TRUE, ...) ## current core of plot2.default
plot2_factor_numeric_boxplot(x, y, axes = TRUE, ..) ## no support for 'by' within the plot (only via facets)
plot2_factor_factor_spineplot(x, y, axes = TRUE, ...)
plot2_numeric_factor_spineplot(x, y, axes = TRUE, ...)
plot2_numeric_factor_cdplot(x, y, axes = TRUE, ...)
But also functions for a single numeric x variable (plus optional grouping):
plot2_numeric_none_histogram(x, y = NULL, by = NULL, axes = TRUE, ...) ## check that y is actually empty
plot2_numeric_none_density(x, y = NULL, by = NULL, axes = TRUE, ...)
Outer vs. inner function:
The plot2.default() function would then provide the "outer" skeleton which decides whether facets are needed/desired or not and then calls the plot2_x_y_type() functions as appropriate. It needs to know where the axes go and how the margins need to be set (which probably depends on the type of plot).
Open questions:
- Who draws the legend? The "outer"
plot2.default()function or the "inner"plot2_x_y_type()function? - Are there further common arguments to the "inner" functions?
- Does the "outer"
plot2.default()function know about other arguments or does it just pass these on to the "inner" functions?
This sounds great @zeileis.
(As an aside, I've thought before that we may need to offer more flexible axes control... Although, at least in the non-faceted cases that had in mind this could be done through some global par(las=<value>, col.axis=<value>, ...) options.)
Let me push me finish up a PR or two that address #19. Hopefully I'll have time after work today. Once those are merged, then I'll hold off making any other changes on the codebase until we have resolved the internal changes you have proposed above.
OK, thanks, sounds good!
One special case that we might want to think about separate support for is point-range plots (following @vincentarelbundock's PR in #35). In particular, we probably want to handle the x-axis carefully if we are passing a vector of characters or factors, e.g. coefficient names.
At present, we have to manually convert the x axis to a numeric first...
library(plot2)
par(pch = 19)
mod = lm(mpg ~ hp + factor(cyl), mtcars)
coefs = data.frame(names(coef(mod)), coef(mod), confint(mod))
coefs = setNames(coefs, c("x", "y", "ymin", "ymax"))
with(
coefs,
plot2(
x = 1:4, # <<-- Problem: has to be numeric ATM
y = y,
ymin = ymin,
ymax = ymax,
type = "pointrange"
)
)

... whereas, we'd ideally just be able to pass it the x variable directly and it would handle labels appropriately.
# aspirational code example that doesn't currently work
with(
coefs,
plot2(
x = x,
y = y,
ymin = ymin,
ymax = ymax,
type = "pointrange"
)
)

Created on 2023-06-19 with reprex v2.0.2
Should be easy to do. But just flagging so that we don't inadvertently impose/override with unexpected behaviour upstream.