scoringutils icon indicating copy to clipboard operation
scoringutils copied to clipboard

Support for "Mixture of Parameters" scoring

Open damonbayer opened this issue 1 year ago • 10 comments

I am borrowing the phrasing of this concept from Krüger 2021. Often, forecasts are issued as mixtures of closed-form forecasts (e.g. each MCMC sample corresponds to a predictive distribution for the data). In practice, we often sample from each of these closed-form forecasts to produce a sample-based forecast, but we could use the mixture of distributions directly. This is particularly useful for scoring forecasts for which the realized value is very unlikely (such that it is not included in the sampled values) but still in the support of the mixture forecast.

This feature is alluded to on pg 23 of the manuscript:

forecasts represented in a closed-form distribution (as can be scored for example using scoringRules are not supported.

(Note the missing closing parenthesis)

Could this feature be supported?

damonbayer avatar Sep 06 '24 17:09 damonbayer

Hi Damon, you mean scoring closed-form distributions? In principle it should be possible. I think the major part of the work would be designing appropriate input and output formats.

I.e. something like this: image

or this

image

Would you be interested in helping with this?

nikosbosse avatar Sep 06 '24 17:09 nikosbosse

Hi Damon, you mean scoring closed-form distributions? In principle it should be possible. I think the major part of the work would be designing appropriate input and output formats.

Yes, I think the input format would be something like

column type
observed numeric
distribution character (but has to be something supported by scoringRules)
parameters named numeric vector (arguments to distribution) (or named list if the arguments to the distribution are not all numeric)

Would you be interested in helping with this?

Sure.

damonbayer avatar Sep 06 '24 18:09 damonbayer

The simple version would be check that the distribution is one of the required ones, but not check the parameters (i.e. leave that to scoringRules. I think it shouldn't be too complicated. But then again it's always more complicated than I think initially :) What is your timeline/level of urgency with this? I.e. do you have a specific project in mind that you would like to support?

nikosbosse avatar Sep 09 '24 06:09 nikosbosse

No particular urgency. I hacked this together with the old (non-dev) version of scoringutils for a previous project and would prefer for it to be more supported. Would you be able to put together an outline/checklist of requirements for this contribution?

damonbayer avatar Sep 09 '24 14:09 damonbayer

This sounds like a good idea and happy to support.

version of scoringutils for a previous project and would prefer for it to be more supported. Would you be able to put together an outline/checklist of requirements for this contribution?

@damonbayer do you have some version of this scratch code you can share to give some hints into any gotchas here?

@nikosbosse happy to lead on building on the actions that would need to be taken here to go from idea to fully implemented? I am just wondering if there is any reason to wait for any changes to requirements for a new forecasting type?

seabbs avatar Sep 09 '24 17:09 seabbs

nice! so my current plan was to address https://github.com/epiforecasts/scoringutils/issues/832 next - this should make it a bit easier to create a new forecast class.

Then I think it would be good to reorganise the files such that all functions related to a forecast type are in one file - this should give us some more clarity what we actually have to do to create a new forecast type.

We have another request for a new forecast type here: https://github.com/epiforecasts/scoringutils/issues/846. I think this could be a nice test bed to test the flow for creating a new forecast class. Ideally we should document what to do more clearly such that others know what they have to do.

My personal preference would be to implement the ordinal forecast one first, but it's of course also possible to do it the other way round and use the distribution scoring as a test bed.

@seabbs help on all of these appreciated if you like. Otherwise that would be the rough order in which I would tackle things.

nikosbosse avatar Sep 09 '24 21:09 nikosbosse

This all sounds like a sensible plan to me and I agree that ordinal might be easier. If @damonbayer is up for it that PR could be used as a bit of a guide for implementing this new class?

seabbs avatar Sep 10 '24 09:09 seabbs

Flagging some (potentially related PRs: #888 and #889). TLDR is: all of this is not very hard, but we still have some work to do to make everything truly modular.

nikosbosse avatar Sep 11 '24 19:09 nikosbosse

If I'm understanding the issue correctly, then I think it is possible that some of the ideas outlined in this preprint may be relevant to a possible solution here. The key idea of this paper is that closed-form mixture distributions could be used to approximate complex posterior/predictive distributions that perhaps only exist as samples from that distribution. There are some specific ideas (that could be adapted to work with scoringutils) about how to represent such a mixture.

For context: we have run into some related issues about this trade off between filesize and scoreability in the SARS-CoV-2 Variant Nowcasting Hub.

nickreich avatar Sep 30 '24 20:09 nickreich

I think we have cleared out all the blockers so this is possible to work on now.

seabbs avatar Dec 10 '24 16:12 seabbs