pynndescent is changing the input data, is this expecte?
It seems if the metirc is dot, the input data will be normalized, is this expected? https://github.com/lmcinnes/pynndescent/blob/master/pynndescent/pynndescent_.py#L743
Can confirm–this happens if the metric is dot and the input data is np.float32. If it's different dtype, the check_array call on L684 has already made a copy of the input. This is definitely a bug.
I have fixed the aspect that @jamestwebber pointed out -- we shouldn't alter the users data, that's definitely a bug.
With regard to "dot" as a metric -- it is essentially assuming you really want something like an angular or cosine distance, and, indeed, the actual cosine distance exploits this and uses dot under the hood. Is there a use case for dot product distance measures with non-normalized data? We can always add a different name to handle that.
I have fixed the aspect that @jamestwebber pointed out -- we shouldn't alter the users data, that's definitely a bug.
With regard to "dot" as a metric -- it is essentially assuming you really want something like an angular or cosine distance, and, indeed, the actual cosine distance exploits this and uses dot under the hood. Is there a use case for dot product distance measures with non-normalized data? We can always add a different name to handle that.
I guess there are use cases of dot product with non-normalized data, and the "dot" name itself is not implicating there is a normalization process there.
Is there a use case for dot product distance measures with non-normalized data?
Personally I have no idea, but this is probably a separate issue from the bug that you fixed.
@troilus-canva : That makes sense I guess; could you suggest a naming convention you would like>
@troilus-canva : That makes sense I guess; could you suggest a naming convention you would like>
normalize_dot?
Was a true un-normalized dot product metric ever added? This would be useful in cases where the original embedding was created with this in mind (for example, a matrix factorization).