Silent model column requirement
It appears that there is a new requirement for a model column to be present in the data. This is not documented in score().
When running example_quantile |> filter(model == "epiforecasts-EpiNow2") |> select(-model) |> score() I get a message. It's not technically required, just easier to have a model column. But true, could probably add a sentence to the function documentation.
Hmm I saw this as an error from check_forecasts? I would have thought the desired behaviour would be to specify a model column only where required and mention it in the docs + what it is used for.
Oh. Not ideal. Could you please post a short reprex?
Because when I run example_quantile |> filter(model == "epiforecasts-EpiNow2") |> select(-model) |> check_forecasts() I also don't get an error, but just a message
Yeah will do. I think perhaps may have conflated with an error from another source. Will investigate.
@seabbs is this still relevant?
yes - I think so when running score.
So setting up a reprex for this with the latest version it does look like a soft vs hard requirement.
library(scoringutils)
#> Note: scoringutils 1.0.0 introduces a lot of breaking changes and we apologise for any inconvenience. If you prefer the old interface, please download version 0.1.8 using remotes::install_github("epiforecasts/[email protected]")
library(dplyr)
#>
#> Attaching package: 'dplyr'
#> The following objects are masked from 'package:stats':
#>
#> filter, lag
#> The following objects are masked from 'package:base':
#>
#> intersect, setdiff, setequal, union
example_quantile |>
filter(model == "epiforecasts-EpiNow2") |>
select(-model) |>
score() |>
head() |>
print()
#> The following messages were produced when checking inputs:
#> 1. There is no column called `model` in the data. scoringutils therefore thinks that all forecasts come from the same model
#> Key: <location, target_end_date, target_type, location_name, forecast_date, horizon, model, range>
#> location target_end_date target_type location_name forecast_date horizon
#> <char> <Date> <char> <char> <Date> <num>
#> 1: DE 2021-05-08 Cases Germany 2021-05-03 1
#> 2: DE 2021-05-08 Cases Germany 2021-05-03 1
#> 3: DE 2021-05-08 Cases Germany 2021-05-03 1
#> 4: DE 2021-05-08 Cases Germany 2021-05-03 1
#> 5: DE 2021-05-08 Cases Germany 2021-05-03 1
#> 6: DE 2021-05-08 Cases Germany 2021-05-03 1
#> model range interval_score dispersion underprediction
#> <char> <num> <num> <num> <num>
#> 1: Unspecified model 0 44192.0 0.0 0
#> 2: Unspecified model 10 43937.3 4713.3 0
#> 3: Unspecified model 10 43937.3 4713.3 0
#> 4: Unspecified model 20 42223.8 8256.8 0
#> 5: Unspecified model 20 42223.8 8256.8 0
#> 6: Unspecified model 30 39955.8 10824.8 0
#> overprediction coverage coverage_deviation bias quantile ae_median
#> <num> <num> <num> <num> <num> <num>
#> 1: 44192 0 0.0 0.9 0.50 44192
#> 2: 39224 0 -0.1 0.9 0.45 44192
#> 3: 39224 0 -0.1 0.9 0.55 44192
#> 4: 33967 0 -0.2 0.9 0.40 44192
#> 5: 33967 0 -0.2 0.9 0.60 44192
#> 6: 29131 0 -0.3 0.9 0.35 44192
#> quantile_coverage
#> <lgcl>
#> 1: TRUE
#> 2: TRUE
#> 3: TRUE
#> 4: TRUE
#> 5: TRUE
#> 6: TRUE
Created on 2023-01-12 with reprex v2.0.2
Given that I think what we need to close this out is just a pass through the documentation to make this clear.