annotated_deep_learning_paper_implementations
annotated_deep_learning_paper_implementations copied to clipboard
Bug in Transformer-XL shift method
Hi!
In the original paper implementation they are using dims [1:] : x = x_padded[1:].view_as(x) their code but in your implementation you are using [:-1]: x = x_padded[:-1].view_as(x) your code which produces wrong matrix at the output.
from typing import Optional, List. is wrong