zyx1017 comments

Results 18 comments of


                                            zyx1017

About the pad issues in the code

@hyc9 大佬，对不起，我有个疑问：在计算多头注意力得分时，为什么不需要像以前的做法一样，需要把之后的交互给mask掉？比如以往的做法都会加上一个mask矩阵：attention_scores = attention_scores + attention_mask 而本代码见如下，并没有进行mask操作。对此感到疑问！希望能得到解答！！！ attention_scores = torch.matmul(query_layer, key_layer.transpose(-1, -2)) attention_scores = attention_scores / math.sqrt(self.attention_head_size) \# normalize the attention scores to probabilities. attention_probs = nn.Softmax(dim=-2)(attention_scores) attention_probs...

About the pad issues in the code

嗷嗷，了解了！谢谢

About the pad issues in the code

@hyc9 大佬，这里的代码所有的序列推荐的训练方式好像都是采用自回归的方式？

About the pad issues in the code

> > @hyc9 大佬，这里的代码所有的序列推荐的训练方式好像都是采用自回归的方式？ > > 这里代码采用的应该是1,2,3,4->5这样的方式 + 序列数据增强吧他做数据增强操作的时候好像就是这么做的：[ [1,], [1,2,], [1,2,3,], [1,2,3,4,] ] -> [2,3,4,5] 这不就类似自回归的训练方式吗？对比SASRec这个模型，与LightSANS的训练方式应该是一样的，但是SASRec在进行多头注意力的时候有进行mask操作。很奇怪