CheerM
CheerM
Hi there, It's so exciting to know the post-norm can stabilize the training process. We tried to reimpl the swint w/ post-norm, just like reported in Table6 81.6 top1-acc in...
Hi, it's a great and concise reimplementation of MLP works. Meanwhile, I'm wondering how is the performance of reimplemented version compares to the reported performance of the manuscripts? It would...
hi, would you mind releasing the training log for T2t-vit-t-14 training with 8 GPUs? I tried to rerun the script for training T2t-vit-t-14 with 8 GPUs. It gained 0.094 for...
Hi, I got some issues with the reimplementation of models trained on iwslt'14 de-en. The hyper-parameters of DeLighT(d_m=512) were set as https://github.com/pytorch/fairseq/blob/master/examples/translation/README.md, like `CUDA_VISIBLE_DEVICES=0 fairseq-train \ data-bin/iwslt14.tokenized.de-en \ --arch transformer_iwslt_de_en...
Hi, just wondering how is the reimplemented performance for dense/random/hybrid synthesizers?