关于这个模型的大概过程

Open fudanchenjiahao opened this issue 6 years ago • 1 comments

template是一句话，然后可能句子有两个blank。将两个答案合在一起，做个masked的self-attention（只能看到当前预测词之前的信息），然后以这个结果作为query。另外一边，那句话（带有两个mask token）作为key和value，再做一次attention，再经过一个FF就得到了一层的输出。下一层的query来源于上一层的输出再经过一个self-attention，key和value不变，还是template。然后重复这一过程。是不是这就是训练过程啊。老铁，是不是这就是模型大概过程啊，我主要的疑惑是训练时候decoder那边的input是不是就是只是两个blank处真实答案的拼接，还是原句都会和他们拼接在一起

Mar 28 '19 11:03 fudanchenjiahao

In case someone has similar questions, I'll try to address my answer in English.

You may refer to our paper for details about the model.

Regarding your question, when the template contains multiple blanks, we fill in the blanks one by one(instead of concatenating all answers as you mentioned). In the decoder, the whole template is exposed for attention(similar to the "encoder-decoder attention" in vanilla Transformer) while the answer for the current blank goes through the masked self-attention process.

Apr 09 '19 05:04 VegB