python-libmf

python-libmf copied to clipboard

PorkShoulderHolder

Reame
Issues

Integers casted to float32 can lose precision

Open david-cortes opened this issue 3 years ago • 7 comments

When receiving IDs for users and items, they get casted to float32. For values above 2^23 (around 8.3 million), the conversion might be imprecise, since not all integers higher than that are possible to represent exactly in float32, and thus some user and items might get mixed up with others or might get their IDs reassiged, which is problematic when predicting.

Aug 09 '22 19:08 david-cortes

How do you recommend we solve?

Aug 02 '23 17:08 PorkShoulderHolder

Easiest way would be to change the interfaces towards passing three arrays instead (user, item, rating).

Aug 02 '23 17:08 david-cortes

Can you reference a link to the particular conversion that you think is concerning

Aug 02 '23 17:08 PorkShoulderHolder

Aug 02 '23 17:08 david-cortes

Also if you write a fix for this I will merge. Sorry about the delay on your last one, I'm going to try to maintain this repo better from now on.

Aug 02 '23 17:08 PorkShoulderHolder

I think lets keep the python interface the same, but yea we could change whats passed to the c++ bindings under the hood.

Aug 02 '23 17:08 PorkShoulderHolder

@david-cortes the better solution might be to do arr.astype(np.int32). I don't see why that won't work, but still need to look at this - its been a moment.

Aug 02 '23 17:08 PorkShoulderHolder