visualisation_recipe with transformed responses
Currently, the default plot produced by estimate_expected() with a transformed response variable isn't correct. The data points are in the raw metric, but the predicted line is in the transformed metric.
We should either:
- apply the same transformation to the data when plotting as used in the model specification
- include a
transforminestimate_expectation()to allow back-transforming of predictions
library(modelbased)
library(ggplot2)
m_inline_trans <- lm(log(dist) ~ speed, data = cars)
estimate_expectation(m_inline_trans) |>
plot()

estimate_expectation(m_inline_trans) |>
dplyr::mutate(
Predicted = exp(Predicted),
SE = SE * exp(Predicted),
CI_low = exp(CI_low),
CI_high = exp(CI_high)
) |>
plot()

cars$log_dist <- log(cars$dist)
m_pre_trans <- lm(log_dist ~ speed, data = cars)
estimate_expectation(m_pre_trans) |>
plot()

estimate_expectation(m_pre_trans) |>
plot() +
scale_y_continuous(trans = "exp")
#> Warning in self$trans$inverse(limits): NaNs produced
#> Error in if (zero_range(as.numeric(limits))) {: missing value where TRUE/FALSE needed
estimate_expectation(m_pre_trans) |>
ggplot() +
aes(x = speed) +
geom_point(
aes(y = log_dist), data = insight::get_data(m_pre_trans)
) +
geom_ribbon(
aes(ymin = CI_low, ymax = CI_high),
alpha = .5
) +
geom_line(
aes(y = Predicted)
)
Created on 2021-10-08 by the reprex package (v2.0.1)
We should add a transformation argument to estimate_*() functions, eg.:
library(modelbased)
library(ggplot2)
m_inline_trans <- lm(log(dist) ~ speed, data = cars)
estimate_expectation(m_inline_trans, transformation = exp) |>
plot()
Then internally, if transformation is given, we do, e.g., for .estimate_predicted():
out$Predicted <- transformation(out$Predicted)
out$CI_low <- transformation(out$CI_low)
out$CI_high <- transformation(out$CI_high)
out$Residuals <- transformation(out$Residuals)
trans_env <- as.environment(out)
trans_env$transformation <- transformation
jacob <- as.numeric(stats::numericDeriv(quote(transformation(Predicted)), "Predicted", rho = trans_env))
out$SE <- out$SE * jacob
@strengejacke Can you write a find_transformation() function for insight that extracts the functions used to transform a response variable in a formula (basically the opposite of insight:::.remove_pattern_from_names())? Ideally, it would be really cool if it would extract as an executable function and return identity if there is no transformation.
Should this function only refer to the response variable, or also to a possible link-function?
(i.e. should log(y) and family = xy("log") both return the same thing, namely "log"?)
I think we already handle link/inverse link plotting fine. So we only need to handle in-formula transformations with such a function
And the return value should be a function, right? Should that function do the same transformation, or the inverse-transformation?
i.e. if formula = log(y), should find_transformation() return log() or exp()?
Furthermore, I suggest using get_transformation() to return a function, and find_transformation() to return the string representation? Does that make sense?
Get/find makes sense.
How about get returns a list with $tranformation and $inverse slots
I'm not that familiar with the code, maybe @DominiqueMakowski or @bwiernik can work in this? Would be great to have this resolved before an update is submitted to CRAN.
I don't know how to "fix" that really, shouldn't all this transfo stuff be done at insight's level?
We could probably do this:
if (insight::find_transformation(model) != "identity") {
transformation <- insight::get_transformation(model)$inverse
out$Predicted <- transformation(out$Predicted)
out$CI_low <- transformation(out$CI_low)
out$CI_high <- transformation(out$CI_high)
out$Residuals <- transformation(out$Residuals)
trans_env <- as.environment(out)
trans_env$transformation <- transformation
jacob <- as.numeric(stats::numericDeriv(quote(transformation(Predicted)), "Predicted", rho = trans_env))
out$SE <- out$SE * jacob
}
However, I'm not sure about the numericDeriv-stuff, looks like the code above can be directly used.
And I'm, not sure where (i.e. to which function) to put this code?
We should add the transformation argument to get_predicted() and add the call to find a model's inverse transformation inside .estimate_predicted(), which we pass to get_predicted()
Ok, so get_predicted() gets a transform argument, which could be NULL or a function, and then in insight:::.get_predicted_out() we would apply that transformation?
Yep