Results 15 comments of Yoni Kremer

In my experience with TensorFlow, the time estimation is pretty accurate, so I think best effort estimation should give accurate results.

I'm pretty sure dataset size means the size of the raw text.

Seems like the problem is loading data/saved_embeddings/train/c6-whitened-256_4.parquet.gzip to a dataframe

Two tests fail due to the issue: `TestArrayObjectComparison::test_eq_object` `TestArrayObjectComparison::test_ne_object`

@kmaehashi @leofang I get why you don't want to implement it that way. But cupy is supposed to be Numpy-compatible. In addition, some tests fail due to this issue: In...

In numpy 2.1, I get: ``` >>> x_np = np.array([4]).astype(np.float32) >>> y1_np = np.array([2]) # int64 >>> y2_np = np.array(2) # int64 >>> y3_np = 2 >>> x_np / y1_np...

I started thinking about it, in most cases, top k is very small comared to the vocab size (100 vs 100k), maybe storing the results as a sparse tensor would...

I think that later on computing sotmax and sampling from a sparse tesnor should be much much faster

How can I check the numeric stability of the kernel?