brotherb

Results 2 issues of brotherb

Hi, I noitced that after k-means clustering, cluster with only one element is merged to the most similar cluster. My guess to this decision is that if this cluster has...

Hey, there. After I read the code, I am confused that the computation cost can be reduced by mask more tokens. Did I miss anything? PS. I see the FLOPS...