java-LSH icon indicating copy to clipboard operation
java-LSH copied to clipboard

Dealing with missing data

Open LarsOL opened this issue 9 years ago • 2 comments

I am contemplating using LSH in my application, but I am unsure how to deal with absent/missing data in a vector. The nearest neighbor imputation implies that this type of algorithm deals with this scenario, but how would I go about implementing it?

LarsOL avatar Dec 16 '16 01:12 LarsOL

Hi!

I never had to test this, but my guess would be providing default values...

Le ven. 16 déc. 2016 02:34, Lars Lawoko [email protected] a écrit :

I am contemplating using LSH in my application, but I am unsure how to deal with absent/missing data in a vector. The nearest neighbor imputation implies that this type of algorithm deals with this scenario, but how would I go about implementing it?

— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub https://github.com/tdebatty/java-LSH/issues/12, or mute the thread https://github.com/notifications/unsubscribe-auth/AA1SDJ8n5yhHkpojA044HXTz_gwhv3SCks5rIeqjgaJpZM4LOwp3 .

tdebatty avatar Dec 21 '16 07:12 tdebatty

The main issue I see with providing a default value is that; wouldn't the values be artificially clustered around those "default" values that seem valid for the algorithm ? Random data may work, but then it is not deterministic.

Ideally what would happen is you can ignore a dimension if there is not a value in it.

LarsOL avatar Dec 21 '16 09:12 LarsOL