Zhenglun Kong
Zhenglun Kong
Hi, I have a question regarding the ablation studies of model size in the paper. When you compare the performance between different depths and widths, do you have to do...
Hi, Thank you for your interesting work! I have just started to learn BERT and distillation recently. I have some general questions regarding this topic. 1. I want to compare...
Hi, I have a very rookie question. How can I calculate the FLOPs of BERT model? I tried to use thop, ``` macs, params = profile(model, inputs=(input, ), custom_ops={YourModule: count_your_model})...
Hi, Thank you for sharing your work! Do you have results on Squad 1.0? What parameters should it be? Best, ZLK