Joseph Herrera

Results 2 issues of Joseph Herrera

Is the input shape of `MultiHeadAttention` [batch_size, sequence_length, embedding_size]? Or is it the same as `nn.MultiheadAttention` where the input shape must be [sequence_length, batch_size, embedding_size]

In the TSTNN paper, the two-stage transformer block passes the input into a local transformer followed by a global transformer. This means that the sequence length (in [here](https://pytorch.org/docs/stable/generated/torch.nn.MultiheadAttention.html)) for the...