python-libmf icon indicating copy to clipboard operation
python-libmf copied to clipboard

Integers casted to float32 can lose precision

Open david-cortes opened this issue 3 years ago • 7 comments

When receiving IDs for users and items, they get casted to float32. For values above 2^23 (around 8.3 million), the conversion might be imprecise, since not all integers higher than that are possible to represent exactly in float32, and thus some user and items might get mixed up with others or might get their IDs reassiged, which is problematic when predicting.

david-cortes avatar Aug 09 '22 19:08 david-cortes

How do you recommend we solve?

PorkShoulderHolder avatar Aug 02 '23 17:08 PorkShoulderHolder

Easiest way would be to change the interfaces towards passing three arrays instead (user, item, rating).

david-cortes avatar Aug 02 '23 17:08 david-cortes

Can you reference a link to the particular conversion that you think is concerning

PorkShoulderHolder avatar Aug 02 '23 17:08 PorkShoulderHolder

image

david-cortes avatar Aug 02 '23 17:08 david-cortes

Also if you write a fix for this I will merge. Sorry about the delay on your last one, I'm going to try to maintain this repo better from now on.

PorkShoulderHolder avatar Aug 02 '23 17:08 PorkShoulderHolder

I think lets keep the python interface the same, but yea we could change whats passed to the c++ bindings under the hood.

PorkShoulderHolder avatar Aug 02 '23 17:08 PorkShoulderHolder

@david-cortes the better solution might be to do arr.astype(np.int32). I don't see why that won't work, but still need to look at this - its been a moment.

PorkShoulderHolder avatar Aug 02 '23 17:08 PorkShoulderHolder