Did not multiply embedding weights by sqrt(d_model)
Hi, In this line: https://github.com/SamLynnEvans/Transformer/blob/37bf49224ccc0ab5a2c8cdb2c330ccd76628e57a/Embed.py#L12
I think you need to multiply the embedding by sqrt(d_model)

@orena1 Hi, the implementation also didn't share the embedding weights, right?
@orena1 The code actually has * math.sqrt(self.d_model) in the positional embedding class. In forward method.
Did somebody know the reason for multiplying embedding weights by sqrt(d_model)?
@orena1 Hi, the implementation also didn't share the embedding weights, right?
Yes, the implementation didn't share the embedding weights.