Shepard comments

Repositories
Issues
Comments

Results 2 comments of


                                            Shepard

there is no one-layer MLP in attention Layer

I have problems at the same point. I can see that the "one layer MLP" is split in your code into the dense layer and line 174 in the attention-layer....

there is no one-layer MLP in attention Layer

I just looked up the TimeDistributed Layer-Wrapper again and realized that it means that the same weights are also shared among the input-hidden-layer-connection of the MLP. And I think I...