Shepard
Results
2
comments of
Shepard
I have problems at the same point. I can see that the "one layer MLP" is split in your code into the dense layer and line 174 in the attention-layer....
I just looked up the TimeDistributed Layer-Wrapper again and realized that it means that the same weights are also shared among the input-hidden-layer-connection of the MLP. And I think I...