scoringutils Feature: Add data transformation functions to work with other packages

It would be really nice to get the output of other packages into scoringutils easily. For example, getting a stan model, use a function to convert the data such that it can easily be scored. Odin would be another example

Related: #356

Nov 03 '23 15:11 nikosbosse

@jhellewell14

Nov 03 '23 15:11 nikosbosse

As long as we don't introduce new dependencies when doing so

Nov 07 '23 14:11 seabbs

There is some code flying around for a conversion to / from yardstick that we could use, not sure how suitable it, leaving it here in case it's useful:

#' Convert to yardstick format for class predictions
#'
#' @description
#' A function to convert from the format for binary forecasts that
#' `scoringutils` uses to the one used by the `yardstick` package for class
#' predictions.
#'
#' For class predictions, `yardstick` doesn't use probabilities, but takes the
#' actual outcome (0 or 1) as the prediction. The function therefore converts
#' the predictions into either 0 or 1 and makes both "prediction" and
#' "true_value" a factor.
#'
#' See [example_binary] and
#' \url{https://yardstick.tidymodels.org/articles/metric-types.html} for
#' more information.
#'
#' @param binary_predictions A data frame of binary predictions following the
#' same format used for [score()]. `to_yardstick_binary_class` can also be
#' called on the output of [score()].
#' @param fun A function used to convert predictions into 0s and 1s. The default
#' is [round()].
#' @param ... Additional arguments to be passed to `fun`,
#' @return A data.table that conforms to the formatting requirements of
#' `yardstick`.
#' @export
#' @examples
#'
#'
#' \dontrun{
#' library(yardstick)
#' library(dplyr)
#' ex <- to_yardstick_binary_class(example_binary)
#'
#' ex |>
#'   group_by(model) |>
#'   accuracy(truth = true_value, estimate = prediction)
#'
#' }
#' @keywords data-handling

to_yardstick_binary_class <- function(binary_predictions, fun = round, ...) {
  dt <- as.data.table(binary_predictions)
  dt[, true_value := as.factor(true_value)]
  dt[, prediction := as.factor(
    fun(prediction, ...)
  )]
  return(dt[])
}


#' Convert to yardstick format for class probability predictions
#'
#' @description
#' A function to convert from the format for binary forecasts that
#' `scoringutils` uses to the one used by `yardstick` for class
#' probability predictions.
#'
#' The format `yardstick` uses for (binary) class probability predictions is
#' very similar to the one used by `scoringutils`. The only difference is that
#' `yardstick` expects the outcome to be a factor. The function merely
#' converts "true_value" to a factor.
#'
#' See [example_binary] and
#' \url{https://yardstick.tidymodels.org/articles/metric-types.html} for
#' more details on the formats used.
#'
#' @param binary_predictions A data frame of binary predictions following the
#' same format used for [score()]. `to_yardstick_binary_class` can also be
#' called on the output of [score()].
#' @return A data.table that conforms to the formatting requirements of
#' `yardstick`.
#' @export
#' @examples
#'
#'
#' \dontrun{
#' library(yardstick)
#' library(dplyr)
#' ex <- to_yardstick_binary_class_prob(example_binary)
#'
#' ex |>
#'   group_by(model) |>
#'   filter(!is.na(prediction)) |>
#'   average_precision(truth = true_value, prediction, event_level = "first")
#'
#' }
#' @keywords data-handling

# Binary class probability predictions
to_yardstick_binary_class_prob <- function(binary_predictions) {
  binary_predictions <- as.data.table(binary_predictions)
  binary_predictions[, true_value := factor(true_value, levels = c(1, 0))]
  return(binary_predictions[])
}

Jan 03 '24 10:01 nikosbosse

I touched mgcv package and its predict function provides point estimates and standard errors for each time point ($fit and $se.fit). To apply scoringutils, we have to convert the result to a quantile data frame. Do you think it is better to support this functionality in this package like sample_to_quantile? This format conversion function does not require further dependencies (just requires the input of point estimates and sd estimates).

Feb 22 '24 09:02 toshiakiasakura

my instinct is that in this particular case its a user side problem as the conversion (to posterior samples) is quite complicated and package specific.

Feb 22 '24 23:02 seabbs