About the computation of logits after decoder

Open liuqk3 opened this issue 2 years ago • 1 comments

Hi @LTH14 ,

Great work! Here I have a question about the computation of logits after decoder. I find that a MlmLayer is used. The output of decoder is mapped by a fc layer, and then the dot product between the mapped features and word embeddings are obtained, which is added with bias and used as the logits.

Have you tried to get the logits directly using a fc layer (upon the output feature of decoder)? What's the main difference between these two types of logits? And which one do you think is better?

Thanks.

Mar 29 '23 02:03 liuqk3

We follow BERT for this design. I haven't tried using logits directly from an fc layer.

Mar 29 '23 02:03 LTH14