modelbased icon indicating copy to clipboard operation
modelbased copied to clipboard

Changing defaults for `estimate_relation()`? (and for `data = "grid")`

Open strengejacke opened this issue 3 years ago • 7 comments

I think we should change the behaviour of data = "grid" in expect_relation() (and related), and add an option like data = "fullgrid". With this, we could:

# estimate_expectation(data = "fullgrid"), previous behaviour
m <- lm(Sepal.Width ~ Species * Sepal.Length, data = iris)
insight::get_datagrid(m, "all")
#>       Species Sepal.Length
#> 1      setosa          4.3
#> 2      setosa          4.7
#> 3      setosa          5.1
#> 4      setosa          5.5
#> 5  versicolor          5.1
#> 6  versicolor          5.5
#> 7  versicolor          5.9
#> 8  versicolor          6.3
#> 9  versicolor          6.7
#> 10  virginica          5.1
#> 11  virginica          5.5
#> 12  virginica          5.9
#> 13  virginica          6.3
#> 14  virginica          6.7
#> 15  virginica          7.1
#> 16  virginica          7.5
#> 17  virginica          7.9

# estimate_expectation(data = "grid") - etimate_relation() should default to this
m <- lm(Sepal.Width ~ Species * Sepal.Length, data = iris)
insight::get_datagrid(m, "all", range = "grid")
#>      Species Sepal.Length
#> 1     setosa     5.015267
#> 2 versicolor     5.015267
#> 3 versicolor     5.843333
#> 4 versicolor     6.671399
#> 5  virginica     5.015267
#> 6  virginica     5.843333
#> 7  virginica     6.671399

I think this would be helpful with #201 / #189 and #199 / #145.

However this requires the GitHub version of insight to be on CRAN.

strengejacke avatar Aug 15 '22 06:08 strengejacke

But what about preserve_range

m <- lm(Sepal.Width ~ Species * Sepal.Length, data = iris)
insight::get_datagrid(m)
#>    Sepal.Length    Species
#> 1           4.3     setosa
#> 2           4.7     setosa
#> 3           5.1     setosa
#> 4           5.5     setosa
#> 5           5.1 versicolor
#> 6           5.5 versicolor
#> 7           5.9 versicolor
#> 8           6.3 versicolor
#> 9           6.7 versicolor
#> 10          5.1  virginica
#> 11          5.5  virginica
#> 12          5.9  virginica
#> 13          6.3  virginica
#> 14          6.7  virginica
#> 15          7.1  virginica
#> 16          7.5  virginica
#> 17          7.9  virginica


insight::get_datagrid(m, preserve_range=FALSE)
#>    Sepal.Length    Species
#> 1           4.3     setosa
#> 2           4.7     setosa
#> 3           5.1     setosa
#> 4           5.5     setosa
#> 5           5.9     setosa
#> 6           6.3     setosa
#> 7           6.7     setosa
#> 8           7.1     setosa
#> 9           7.5     setosa
#> 10          7.9     setosa
#> 11          4.3 versicolor
#> 12          4.7 versicolor
#> 13          5.1 versicolor
#> 14          5.5 versicolor
#> 15          5.9 versicolor
#> 16          6.3 versicolor
#> 17          6.7 versicolor
#> 18          7.1 versicolor
#> 19          7.5 versicolor
#> 20          7.9 versicolor
#> 21          4.3  virginica
#> 22          4.7  virginica
#> 23          5.1  virginica
#> 24          5.5  virginica
#> 25          5.9  virginica
#> 26          6.3  virginica
#> 27          6.7  virginica
#> 28          7.1  virginica
#> 29          7.5  virginica
#> 30          7.9  virginica

Created on 2022-08-15 by the reprex package (v2.0.1)

DominiqueMakowski avatar Aug 15 '22 07:08 DominiqueMakowski

But what about preserve_range

What do you mean? That argument still works... I was just thinking about having two options of "grids", and therefore changing the default behaviour.

strengejacke avatar Aug 15 '22 07:08 strengejacke

I think we should change the behaviour of data = "grid" in expect_relation() (and related), and add an option like data = "fullgrid"

I'm not quite sure what would the new behavior would be from the reprex

DominiqueMakowski avatar Aug 15 '22 07:08 DominiqueMakowski

"fullgrid" will become the old "grid", and "grid" will use less values for numeric variables that are not at the first position. This should address #189

strengejacke avatar Aug 15 '22 07:08 strengejacke

Is this something that could be done by visualization_recipe? Given a grid or data frame, when a variable is in the second position and gets mapped to color, the data is subset to be 3-5 representative values?

(Not sure that's the best idea, but throwing it out there)

bwiernik avatar Aug 15 '22 09:08 bwiernik

I think it would do more harm than good to do another layer of transformation for visualizations, the plot method should do "with what it has" and then users should eventually learn how to get the grid they want to make their plots clearer

DominiqueMakowski avatar Aug 15 '22 09:08 DominiqueMakowski

visualization_recipe?

I think this is something for visualization_matrix() (resp. get_datagrid()), and it's already implemented in insight. (see very first post at top)

strengejacke avatar Aug 15 '22 09:08 strengejacke