Add support for linear-time mmd estimator.
Do we think users are ever going to have equal reference and test batch sizes in practice? I'd guess almost always the reference set is going to be much larger. I wonder if we'd be better off using the B-stat estimator by default for the linear case rather than Gretton's estimator for equal sample sizes. This additionally has the advantage of a tunable parameter that allows for interpolation between a linear and quadratic time estimator.
@arnaudvl @Srceh
Edit: It's actually not quite this simple. However I think we should put some thought into how best to address the n!=m case.
Do we think users are ever going to have equal reference and test batch sizes in practice? I'd guess almost always the reference set is going to be much larger. I wonder if we'd be better off using the B-stat estimator by default for the linear case rather than Gretton's estimator for equal sample sizes. This additionally has the advantage of a tunable parameter that allows for interpolation between a linear and quadratic time estimator.
@arnaudvl @Srceh
Edit: It's actually not quite this simple. However I think we should put some thought into how best to address the n!=m case.
Agree, maybe we can leave the current PR as it is for the linear-time one, and do a separate one for the additional B-stat implementation.
@Srceh @ojcobb Thinking if it wouldn't be cleaner to have a separate LinearMMDDrift and BMMDDrift (for lack of a better name) instead of grouping everything in the existing MMD implementation. It would be a bit easier to debug as well and can just share the MMD base class.
Now the linear-time estimator also uses Gaussian under null for the test threshold, so no permutation is required. It should be the fastest at the cost of lower test power and some unused samples.
@Srceh final thing, were you planning on adding tests and updating the docs?
@Srceh final thing, were you planning on adding tests and updating the docs?
Good question, haven't worked on anything specific, but happy to follow any suggestion.
Good question, haven't worked on anything specific, but happy to follow any suggestion.
We'd usually aim to ensure any new functionality added to the library is documented and tested. Since these new detectors come under MMDDrift, I think its just a very simple case of making sure the new estimator kwarg is documented in doc/source/methods/mmddrift.ipynb 🙂
Could also think about tweaking an existing example to show off (and explain) the linear time MMD if you think if its sufficiently exciting! (this is probs one for @arnaudvl and @ojcobb to weigh in on). I think we mentioned looking at adding some discussion of complexity to the docs at some point, so this could be something to leave until then...
Regarding tests, IMO it might be worth testing the estimators inside alibi_detect/cd/tensorflow/tests/test_mmd_tf.py and alibi_detect/cd/pytorch/tests/test_mmd_pt.py.
Good question, haven't worked on anything specific, but happy to follow any suggestion.
We'd usually aim to ensure any new functionality added to the library is documented and tested. Since these new detectors come under
MMDDrift, I think its just a very simple case of making sure the newestimatorkwarg is documented indoc/source/methods/mmddrift.ipynb🙂Could also think about tweaking an existing example to show off (and explain) the linear time MMD if you think if its sufficiently exciting! (this is probs one for @arnaudvl and @ojcobb to weigh in on). I think we mentioned looking at adding some discussion of complexity to the docs at some point, so this could be something to leave until then...
Regarding tests, IMO it might be worth testing the estimators inside
alibi_detect/cd/tensorflow/tests/test_mmd_tf.pyandalibi_detect/cd/pytorch/tests/test_mmd_pt.py.
Great thanks! will follow up on these.
Good question, haven't worked on anything specific, but happy to follow any suggestion.
We'd usually aim to ensure any new functionality added to the library is documented and tested. Since these new detectors come under
MMDDrift, I think its just a very simple case of making sure the newestimatorkwarg is documented indoc/source/methods/mmddrift.ipynbslightly_smiling_face Could also think about tweaking an existing example to show off (and explain) the linear time MMD if you think if its sufficiently exciting! (this is probs one for @arnaudvl and @ojcobb to weigh in on). I think we mentioned looking at adding some discussion of complexity to the docs at some point, so this could be something to leave until then... Regarding tests, IMO it might be worth testing the estimators insidealibi_detect/cd/tensorflow/tests/test_mmd_tf.pyandalibi_detect/cd/pytorch/tests/test_mmd_pt.py.Great thanks! will follow up on these.
- Yes, tests should always be included for new methods. Can hopefully copy a lot over from existing (e.g. MMD) test and test the linear-time specific changes.
- Same for docs. The method doc can again be quite similar to the MMD one. Similar to online version of MMD.
- For the example, I think it is sufficient to just update an existing example with e.g. 100 runs of the usual MMD detector vs. the linear time one, and show timing differences between those runs (print in cell) but also show that power is likely to go down.
- The method doc can again be quite similar to the MMD one. Similar to online version of MMD.
Not actually sure we need a new method docs page, since the new detectors are dispatched via the existing MMDDrift? Just need to document the estimator kwarg in the existing MMDDrift doc I think? Could perhaps also add a sentence/paragraph detailing algorithm differences and considerations in choosing one or the other?
- The method doc can again be quite similar to the MMD one. Similar to online version of MMD.
Not actually sure we need a new method docs page, since the new detectors are dispatched via the existing
MMDDrift? Just need to document theestimatorkwarg in the existingMMDDriftdoc I think? Could perhaps also add a sentence/paragraph detailing algorithm differences and considerations in choosing one or the other?
Ok yes makes sense.
Left a number of comments related to the PyTorch implementation. Let's work through those first and then we can apply the desired changes to TensorFlow as well.
Check out this pull request on ![]()
See visual diffs & provide feedback on Jupyter Notebooks.
Powered by ReviewNB
Left a number of comments related to the PyTorch implementation. Let's work through those first and then we can apply the desired changes to TensorFlow as well.
TF version is also fixed for the above ones replied with "fixed".
@Srceh I've just merged the score/predict refactoring (#489). This will have introduced a few conflicts you need to resolve. It should simplify your life wrt to the implementation in this PR though!
@Srceh I've just merged the
score/predictrefactoring (#489). This will have introduced a few conflicts you need to resolve. It should simplify your life wrt to the implementation in this PR though!
Nice! will start working on that!
@arnaudvl @Srceh I will resolve the conflicts for you once #537 has been merged.
@Srceh I have now merged in the v0.10.0 related changes from master. This primarily involved changes to the kwargs related to preprocessing, tweaking some tests, and adding your estimator kwarg to the MMDDrift pydantic models (see saving/schemas.py).