tinyplot icon indicating copy to clipboard operation
tinyplot copied to clipboard

Wishlist: Support factors as x or y

Open zeileis opened this issue 2 years ago • 6 comments

Thanks:

Grant @grantmcdermott, the project looks really nice and useful, thanks!

Wishlist:

I just wanted to put on the wishlist that in addition to numeric x and y, it would be great if the factor-based plot() flavors were also supported in plot2(). Especially the plots where y is a factor are in my opinion underappreciated in base R. (It's not surprising that I would say that because I implemented these...)

Plots:

In plot(x, y, ...) and plot(y ~ x, ...) we have the following:

x
Numeric
 
Factor
y   Numeric Scatterplot Parallel boxplots
Factor Spinogram
(Spineplot with histogram-style breaks in x)
Spineplot
(Flavor of mosaic plot)

Examples:

data("SwissLabor", package = "AER")
plot(income ~ age, data = SwissLabor)            ## Numeric ~ Numeric
plot(income ~ foreign, data = SwissLabor)        ## Numeric ~ Factor
plot(participation ~ age, data = SwissLabor)     ## Factor ~ Numeric
plot(participation ~ foreign, data = SwissLabor) ## Factor ~ Factor

It would be great if the plots above would work with plot2() as well - and if the coloring and grouping etc. would also be supported.

GM edit: adding checklist to track

  • [x] Numeric ~ Numeric
  • [x] Numeric ~ Factor (#154)
  • [ ] Factor ~ Numeric
  • [ ] Factor ~ Factor

zeileis avatar Apr 07 '23 23:04 zeileis

Thanks Achim.

I can't promise I'll get to this soon, but I just invited you to the project. No pressure (I'm well aware of how many other plates you have in the air...), but think of it as in invitation to take a stab at these if you get a chance ;-)

grantmcdermott avatar Apr 08 '23 00:04 grantmcdermott

Thanks! I'll try...

zeileis avatar Apr 08 '23 00:04 zeileis

Idea:

I've taken another look at the current state of the code and my idea to tackle support of factor variables and also of facets is the following:

  • We set up internal functions that draw some type/class of x against some type/class of y, potentially also supporting a by grouping variable.
  • Multiple functions might be available for the same combination of x and y, e.g., cdplot vs. spineplot in case of a factor y and a numeric x.
  • The y variable might also be missing so that we can get density or histogram functions.
  • These functions can be called just once, corresponding to what is currently done in plot2.default, or several times in case of mfrow/facets.
  • The functions must have an argument axes = TRUE that can also be set to FALSE so that either axes are drawn or suppressed. But additionally specifications like axes = c(1, 2) or axes = 2 etc. should be possible so that only certain axes are drawn. The latter is needed if we don't want to repeat certain axes in facet displays.

Examples:

Functions would inlcude:

plot2_numeric_numeric_scatter(x, y, by = NULL, axes = TRUE, ...) ## current core of plot2.default
plot2_factor_numeric_boxplot(x, y, axes = TRUE, ..) ## no support for 'by' within the plot (only via facets)
plot2_factor_factor_spineplot(x, y, axes = TRUE, ...)
plot2_numeric_factor_spineplot(x, y, axes = TRUE, ...)
plot2_numeric_factor_cdplot(x, y, axes = TRUE, ...)

But also functions for a single numeric x variable (plus optional grouping):

plot2_numeric_none_histogram(x, y = NULL, by = NULL, axes = TRUE, ...) ## check that y is actually empty
plot2_numeric_none_density(x, y = NULL, by = NULL, axes = TRUE, ...)

Outer vs. inner function:

The plot2.default() function would then provide the "outer" skeleton which decides whether facets are needed/desired or not and then calls the plot2_x_y_type() functions as appropriate. It needs to know where the axes go and how the margins need to be set (which probably depends on the type of plot).

Open questions:

  • Who draws the legend? The "outer" plot2.default() function or the "inner" plot2_x_y_type() function?
  • Are there further common arguments to the "inner" functions?
  • Does the "outer" plot2.default() function know about other arguments or does it just pass these on to the "inner" functions?

zeileis avatar Apr 17 '23 00:04 zeileis

This sounds great @zeileis.

(As an aside, I've thought before that we may need to offer more flexible axes control... Although, at least in the non-faceted cases that had in mind this could be done through some global par(las=<value>, col.axis=<value>, ...) options.)

Let me push me finish up a PR or two that address #19. Hopefully I'll have time after work today. Once those are merged, then I'll hold off making any other changes on the codebase until we have resolved the internal changes you have proposed above.

grantmcdermott avatar Apr 17 '23 19:04 grantmcdermott

OK, thanks, sounds good!

zeileis avatar Apr 17 '23 20:04 zeileis

One special case that we might want to think about separate support for is point-range plots (following @vincentarelbundock's PR in #35). In particular, we probably want to handle the x-axis carefully if we are passing a vector of characters or factors, e.g. coefficient names.

At present, we have to manually convert the x axis to a numeric first...

library(plot2)
par(pch = 19)

mod = lm(mpg ~ hp + factor(cyl), mtcars)
coefs = data.frame(names(coef(mod)), coef(mod), confint(mod))
coefs = setNames(coefs, c("x", "y", "ymin", "ymax"))

with(
    coefs,
    plot2(
        x = 1:4, # <<-- Problem: has to be numeric ATM
        y = y,
        ymin = ymin,
        ymax = ymax,
        type = "pointrange"
    )
)

... whereas, we'd ideally just be able to pass it the x variable directly and it would handle labels appropriately.

# aspirational code example that doesn't currently work
with(
    coefs,
    plot2(
        x = x,
        y = y,
        ymin = ymin,
        ymax = ymax,
        type = "pointrange"
    )
)

Created on 2023-06-19 with reprex v2.0.2

Should be easy to do. But just flagging so that we don't inadvertently impose/override with unexpected behaviour upstream.

grantmcdermott avatar Jun 19 '23 22:06 grantmcdermott