Spherical K-means support (unit norm centroids and input)
Describe the workflow you want to enable
Hi, I was wondering if there is—or has been—any initiative to support cosine similarity in the KMeans implementation (i.e., spherical KMeans). I find the algorithm quite useful and would be happy to propose an implementation. The addition should be relatively straightforward.
Describe your proposed solution
Enable the use of cosine similarity with KMeans or implement a separate SphericalKMeans class.
Describe alternatives you've considered, if relevant
No response
Additional context
No response
Hey I'd like to take up this @Radu1999
Hey I'd like to take up this @Radu1999
No, it's ok, I was planning to implement it once I confirm there is interest for it.
Fairs go for it then
@scikit-learn/core-devs any interest in this?
@Radu1999 to help evaluate this, could you provide some references and context that helps answer the questions from https://scikit-learn.org/stable/faq.html#what-are-the-inclusion-criteria-for-new-algorithms
+1
@Radu1999 to help evaluate this, could you provide some references and context that helps answer the questions from https://scikit-learn.org/stable/faq.html#what-are-the-inclusion-criteria-for-new-algorithms
@betatim the paper has more than 200+ citations and it is published in 2012.. IMO it excels at clustering normalized, directional data (like text), where vector direction matters more than magnitude.
@betatim I'll like to take this!
@scikit-learn/core-devs any interest in this?
I'm not sure that I would consider this as a priority
I'd say with a small maintainable implementation, I'd be happy to have it.
I'd suggest adding configurable distance metric with 'euclidean' by default into existing Kmeans, rather than implementing a separate class. Just like here: https://spark.apache.org/docs/latest/api/python/_modules/pyspark/ml/clustering.html#KMeans