modelbased visualisation_recipe with transformed responses

Currently, the default plot produced by estimate_expected() with a transformed response variable isn't correct. The data points are in the raw metric, but the predicted line is in the transformed metric.

We should either:

apply the same transformation to the data when plotting as used in the model specification
include a transform in estimate_expectation() to allow back-transforming of predictions

library(modelbased)
library(ggplot2)

m_inline_trans <- lm(log(dist) ~ speed, data = cars)

estimate_expectation(m_inline_trans) |> 
  plot()

estimate_expectation(m_inline_trans) |> 
  dplyr::mutate(
    Predicted = exp(Predicted),
    SE = SE * exp(Predicted),
    CI_low = exp(CI_low),
    CI_high = exp(CI_high)
  ) |> 
  plot()

cars$log_dist <- log(cars$dist)
m_pre_trans <- lm(log_dist ~ speed, data = cars)

estimate_expectation(m_pre_trans) |>
  plot()

estimate_expectation(m_pre_trans) |>
  plot() +
  scale_y_continuous(trans = "exp")
#> Warning in self$trans$inverse(limits): NaNs produced
#> Error in if (zero_range(as.numeric(limits))) {: missing value where TRUE/FALSE needed

estimate_expectation(m_pre_trans) |>
  ggplot() +
  aes(x = speed) +
  geom_point(
    aes(y = log_dist), data = insight::get_data(m_pre_trans)
  ) +
  geom_ribbon(
    aes(ymin = CI_low, ymax = CI_high),
    alpha = .5
  ) +
  geom_line(
    aes(y = Predicted)
  )

^{Created on 2021-10-08 by the reprex package (v2.0.1)}

Oct 08 '21 18:10 bwiernik

We should add a transformation argument to estimate_*() functions, eg.:

library(modelbased)
library(ggplot2)

m_inline_trans <- lm(log(dist) ~ speed, data = cars)

estimate_expectation(m_inline_trans, transformation = exp) |> 
  plot()

Then internally, if transformation is given, we do, e.g., for .estimate_predicted():

out$Predicted <- transformation(out$Predicted)
out$CI_low <- transformation(out$CI_low)
out$CI_high <- transformation(out$CI_high)
out$Residuals <- transformation(out$Residuals)

trans_env <- as.environment(out)
trans_env$transformation <- transformation 
jacob <- as.numeric(stats::numericDeriv(quote(transformation(Predicted)), "Predicted", rho = trans_env))
out$SE <- out$SE * jacob

Oct 09 '21 20:10 bwiernik

@strengejacke Can you write a find_transformation() function for insight that extracts the functions used to transform a response variable in a formula (basically the opposite of insight:::.remove_pattern_from_names())? Ideally, it would be really cool if it would extract as an executable function and return identity if there is no transformation.

Oct 10 '21 20:10 bwiernik

Should this function only refer to the response variable, or also to a possible link-function? (i.e. should log(y) and family = xy("log") both return the same thing, namely "log"?)

Oct 12 '21 14:10 strengejacke

I think we already handle link/inverse link plotting fine. So we only need to handle in-formula transformations with such a function

Oct 12 '21 16:10 bwiernik

And the return value should be a function, right? Should that function do the same transformation, or the inverse-transformation?

i.e. if formula = log(y), should find_transformation() return log() or exp()?

Furthermore, I suggest using get_transformation() to return a function, and find_transformation() to return the string representation? Does that make sense?

Oct 13 '21 12:10 strengejacke

Get/find makes sense.

How about get returns a list with $tranformation and $inverse slots

Oct 13 '21 13:10 bwiernik

I'm not that familiar with the code, maybe @DominiqueMakowski or @bwiernik can work in this? Would be great to have this resolved before an update is submitted to CRAN.

Nov 07 '21 14:11 strengejacke

I don't know how to "fix" that really, shouldn't all this transfo stuff be done at insight's level?

Nov 08 '21 08:11 DominiqueMakowski

We could probably do this:

if (insight::find_transformation(model) != "identity") {
  transformation <- insight::get_transformation(model)$inverse
  out$Predicted <- transformation(out$Predicted)
  out$CI_low <- transformation(out$CI_low)
  out$CI_high <- transformation(out$CI_high)
  out$Residuals <- transformation(out$Residuals)

  trans_env <- as.environment(out)
  trans_env$transformation <- transformation 
  jacob <- as.numeric(stats::numericDeriv(quote(transformation(Predicted)), "Predicted", rho = trans_env))
  out$SE <- out$SE * jacob
}

However, I'm not sure about the numericDeriv-stuff, looks like the code above can be directly used.

Nov 08 '21 09:11 strengejacke

And I'm, not sure where (i.e. to which function) to put this code?

Nov 08 '21 09:11 strengejacke

We should add the transformation argument to get_predicted() and add the call to find a model's inverse transformation inside .estimate_predicted(), which we pass to get_predicted()

Nov 08 '21 15:11 bwiernik

Ok, so get_predicted() gets a transform argument, which could be NULL or a function, and then in insight:::.get_predicted_out() we would apply that transformation?

Nov 08 '21 15:11 strengejacke

Yep

Nov 08 '21 17:11 bwiernik