d62lu
Results
3
comments of
d62lu
I have the same issue when running the code, with CUDA 11.6.
yes, I just made the comparison. The speed of the Mamba block is truly slower than Transformer, under the same input dimension
for both model training and inference. I am not looking into the details of the Mamba block, maybe I missed something in the code....