scoringutils icon indicating copy to clipboard operation
scoringutils copied to clipboard

Silent model column requirement

Open seabbs opened this issue 3 years ago • 6 comments

It appears that there is a new requirement for a model column to be present in the data. This is not documented in score().

seabbs avatar May 17 '22 13:05 seabbs

When running example_quantile |> filter(model == "epiforecasts-EpiNow2") |> select(-model) |> score() I get a message. It's not technically required, just easier to have a model column. But true, could probably add a sentence to the function documentation.

nikosbosse avatar May 17 '22 16:05 nikosbosse

Hmm I saw this as an error from check_forecasts? I would have thought the desired behaviour would be to specify a model column only where required and mention it in the docs + what it is used for.

seabbs avatar May 17 '22 17:05 seabbs

Oh. Not ideal. Could you please post a short reprex? Because when I run example_quantile |> filter(model == "epiforecasts-EpiNow2") |> select(-model) |> check_forecasts() I also don't get an error, but just a message

nikosbosse avatar May 17 '22 20:05 nikosbosse

Yeah will do. I think perhaps may have conflated with an error from another source. Will investigate.

seabbs avatar May 19 '22 11:05 seabbs

@seabbs is this still relevant?

nikosbosse avatar Aug 16 '22 22:08 nikosbosse

yes - I think so when running score.

seabbs avatar Aug 17 '22 10:08 seabbs

So setting up a reprex for this with the latest version it does look like a soft vs hard requirement.

library(scoringutils)
#> Note: scoringutils 1.0.0 introduces a lot of breaking changes and we apologise for any inconvenience. If you prefer the old interface, please download version 0.1.8 using remotes::install_github("epiforecasts/[email protected]")
library(dplyr)
#> 
#> Attaching package: 'dplyr'
#> The following objects are masked from 'package:stats':
#> 
#>     filter, lag
#> The following objects are masked from 'package:base':
#> 
#>     intersect, setdiff, setequal, union

example_quantile |>
  filter(model == "epiforecasts-EpiNow2") |>
  select(-model) |>
  score() |>
  head() |>
  print() 
#> The following messages were produced when checking inputs:
#> 1.  There is no column called `model` in the data. scoringutils therefore thinks that all forecasts come from the same model
#> Key: <location, target_end_date, target_type, location_name, forecast_date, horizon, model, range>
#>    location target_end_date target_type location_name forecast_date horizon
#>      <char>          <Date>      <char>        <char>        <Date>   <num>
#> 1:       DE      2021-05-08       Cases       Germany    2021-05-03       1
#> 2:       DE      2021-05-08       Cases       Germany    2021-05-03       1
#> 3:       DE      2021-05-08       Cases       Germany    2021-05-03       1
#> 4:       DE      2021-05-08       Cases       Germany    2021-05-03       1
#> 5:       DE      2021-05-08       Cases       Germany    2021-05-03       1
#> 6:       DE      2021-05-08       Cases       Germany    2021-05-03       1
#>                model range interval_score dispersion underprediction
#>               <char> <num>          <num>      <num>           <num>
#> 1: Unspecified model     0        44192.0        0.0               0
#> 2: Unspecified model    10        43937.3     4713.3               0
#> 3: Unspecified model    10        43937.3     4713.3               0
#> 4: Unspecified model    20        42223.8     8256.8               0
#> 5: Unspecified model    20        42223.8     8256.8               0
#> 6: Unspecified model    30        39955.8    10824.8               0
#>    overprediction coverage coverage_deviation  bias quantile ae_median
#>             <num>    <num>              <num> <num>    <num>     <num>
#> 1:          44192        0                0.0   0.9     0.50     44192
#> 2:          39224        0               -0.1   0.9     0.45     44192
#> 3:          39224        0               -0.1   0.9     0.55     44192
#> 4:          33967        0               -0.2   0.9     0.40     44192
#> 5:          33967        0               -0.2   0.9     0.60     44192
#> 6:          29131        0               -0.3   0.9     0.35     44192
#>    quantile_coverage
#>               <lgcl>
#> 1:              TRUE
#> 2:              TRUE
#> 3:              TRUE
#> 4:              TRUE
#> 5:              TRUE
#> 6:              TRUE

Created on 2023-01-12 with reprex v2.0.2

Given that I think what we need to close this out is just a pass through the documentation to make this clear.

seabbs avatar Jan 12 '23 16:01 seabbs