graphgrove icon indicating copy to clipboard operation
graphgrove copied to clipboard

Mini-batching time cost

Open ASharmaML opened this issue 3 years ago • 0 comments

Firstly, this is a great library for clustering unit-normed feature spaces fast and coherently!

I had a query about mini-batching: for some reason I expected each mini-batch to take roughly the same amount of time (or even less time on subsequent batches) when calling partial_fit. However, each mini-batch seems to take longer than the last, almost in linear fashion. This is on actual structured data with hierarchies and clusters to be found, not on randomly generated matrices.

Pseudo-behaviour First 10,000 data-points: 10 seconds to run Second 10,000 data-points: 20 seconds to run Third 10,000 data-points: 30 seconds to run Total time taken for 30000 data-points: 60 seconds.

Is this expected behaviour? As a side-note, while I found implementation details in the accompanying SCC paper and could follow them, I cannot find any details regarding the mini-batching.

ASharmaML avatar Dec 23 '22 13:12 ASharmaML