Ziang Long

Results 1 issues of Ziang Long

models.attn.ProbAttention Moving average seems more reasonable to me, because you would like to use uniform distribution as initial context. Could you let me know why you are using `cumsum` without...