python_mozetl icon indicating copy to clipboard operation
python_mozetl copied to clipboard

Ensure diversity sampling instead of random (proportionate) sampling

Open mlopatka opened this issue 7 years ago • 2 comments

https://github.com/mozilla/python_mozetl/blob/32d78c34dbb3c9ff5542f1ebc110f5aeb7fce340/mozetl/taar/taar_similarity.py#L131

The diversity of the donor pool is only ensured by the assumption that higher level clustering is substantially diverse. This could be improved by verification of cross-cluster diversity in the addons space.

mlopatka avatar Mar 08 '18 13:03 mlopatka

This also comes back when we specify a proportionate sampling strategy here: https://github.com/mozilla/python_mozetl/blob/491fbda515f985f3156ff0c70859624fd4961ea8/mozetl/taar/taar_similarity.py#L168

A solution here would be to specify weights that emphasize specific (niche) cluster representation in the final sample without compromising the non-addon diversity of "large" cluster sampling.

Even an inverse of the current strategy could be evaluated.

mlopatka avatar Jan 28 '19 16:01 mlopatka

@Dexterp37 can you assign this issue to me please? I have insufficient privileges to grab it :|

mlopatka avatar Jan 28 '19 16:01 mlopatka