Dexter Ju
Results
1
issues of
Dexter Ju
A heads up for whom might be using Transformer encoder. The transformer Encoder Layer forward loop use to be: ``` tensor = tensor + self.dropout(self.attention(tensor, mask=mask)) tensor = _normalize(tensor, self.norm1)...
donotreap