ndarray-koans icon indicating copy to clipboard operation
ndarray-koans copied to clipboard

Passing number of centroids unused

Open CGMossa opened this issue 6 years ago • 1 comments

In update koan there is

    pub fn compute_centroids(
        n_centroids: usize,
        // (n_observations, n_features)
        observations: &ArrayBase<impl Data<Elem = f64>, Ix2>,
        // (n_observations,)
        cluster_memberships: &ArrayBase<impl Data<Elem = usize>, Ix1>,
    ) -> Array2<f64> {

but here the n_centroids is not being used for anything in the solution, and thus it doesn't seem necessary. Maybe there is a bug here?

CGMossa avatar Nov 13 '19 16:11 CGMossa

There is indeed a bug... But in the solution!

As I found out after having written the solutions branch, there is a corner case you can hit if you try the K-means on many different datasets/over multiple runs - a cluster can become empty during an assignment/update cycle. Hence inferring the number of clusters from the number of elements in the HashMap can lead to errors down the line.

Using the provided n_clusters argument sidesteps this issue.

I'll update the solutions branch when I get a chance (probably this weekend).

LukeMathWalker avatar Nov 14 '19 16:11 LukeMathWalker