Zhuguanyu Wu
Results
2
comments of
Zhuguanyu Wu
Thank you! And I have another question. In FQ-ViT, both the softmax and LayerNorm layers are computed in integer form. So, would it be unfair to compare the accuracy of...
1. 目前的模型是同时完成的。我们也做了一个带有cfg-embedding的方案,可能会在后续发布。蒸馏方案是dmd2。 2. 固定5.0 3. 大约3w+的数据量,48卡H100