Shepard

Results 2 comments of Shepard

I have problems at the same point. I can see that the "one layer MLP" is split in your code into the dense layer and line 174 in the attention-layer....

I just looked up the TimeDistributed Layer-Wrapper again and realized that it means that the same weights are also shared among the input-hidden-layer-connection of the MLP. And I think I...