csorujian

Results 2 issues of csorujian

请问一下作者,原文中的对gradient的量化是不是没有应用在code中?

In the paper,it mentioned that the work of the bidirectional language modeling pre-train has been done. Are you planning on releasing some pre-trained weights for the model?