Passing number of centroids unused
In update koan there is
pub fn compute_centroids(
n_centroids: usize,
// (n_observations, n_features)
observations: &ArrayBase<impl Data<Elem = f64>, Ix2>,
// (n_observations,)
cluster_memberships: &ArrayBase<impl Data<Elem = usize>, Ix1>,
) -> Array2<f64> {
but here the n_centroids is not being used for anything in the solution, and thus it doesn't seem necessary. Maybe there is a bug here?
There is indeed a bug... But in the solution!
As I found out after having written the solutions branch, there is a corner case you can hit if you try the K-means on many different datasets/over multiple runs - a cluster can become empty during an assignment/update cycle.
Hence inferring the number of clusters from the number of elements in the HashMap can lead to errors down the line.
Using the provided n_clusters argument sidesteps this issue.
I'll update the solutions branch when I get a chance (probably this weekend).