kmcuda
kmcuda copied to clipboard
Is there a limit on max columns (features) that kmcuda can manage?
kmcuda runs well until 12'000 features:
from libKMCUDA import kmeans_cuda
from time import time
X = np.random.rand(10, 12000).astype(dtype=np.float32)
start = time()
centers_, labels_ = kmeans_cuda(X, 10)
print(time() - start)
0.19472670555114746
It never finishes with 13'000 ÷ 60'000 features.
It throws an error right away with 70'000+ features:
from libKMCUDA import kmeans_cuda
from time import time
X = np.random.rand(10, 70000).astype(dtype=np.float32)
start = time()
centers_, labels_ = kmeans_cuda(X, 10)
print(time() - start)
---------------------------------------------------------------------------
ValueError Traceback (most recent call last)
<ipython-input-1-8e783410a8e6> in <module>
10
11 start = time()
---> 12 centers_, labels_ = kmeans_cuda(X, 10)
13 print(time() - start)
ValueError: "samples": more than 70000 features is not supported
So my question is:
Is there a limit on horizontal dimension kmcuda can manage or I'm missing something?
I'm running Ubuntu 18.04, conda python 3.7 environment, CUDA 10.2, libKMCuda 6.2.3 installed via pip