Compact-Transformers
Compact-Transformers copied to clipboard
[Preprint] Escaping the Big Data Paradigm with Compact Transformers, 2021
Hi, this work is awesome. I just have one little question. The paper says the total batch size is 128 for CIFAR's and 4 GPU's were used in parallel. That...
Hi, i am a little confuse about the output of the CCT. If I have a classification task with n possible classes, are the outputs the logits of each class?...
Hi, There was a small problem with the mask returned from TextTokenizer forward function. The next function using this mask needs a 2D tensor. Therefore, in TextTokenizer, the mask should...
Hello, It seems to make more intuitive sense to use 1D convolutions here over the embedding with a channel size equal to the word embedding dimension, rather than the edge-case...
First of all, thanks for your amazing work! And it seems that your `TransformerEncoderLayer` implementation is a bit different from the 'mainstream' implementations, because you create your residual link **after**...
It seems the models function call in examples/main.py is failing with the error message: ``` Traceback (most recent call last): File "/mnt/code/Compact_Transformers/examples/main.py", line 279, in main() File "/mnt/code/Compact_Transformers/examples/main.py", line 127,...
This is incredibly exciting! Thank you so much. I'm interested in exploring this with NLP. Unfortunately, I'm running into some issues that seem to be related to expected tensor sizes...
Firstly, thank you very much for your work. But when I used your open source code to classify my dataset images, the accuracy was not ideal, perhaps because the model...
Hi team, Thanks for open sourcing your work, may I ask if you might know any implementation of these models in Flax/Jax? Best, Naren
Hi, I am trying to replicate your Text Classification results so that I can then use your models on my own data set, but I am unable to get any...