A question about weight initialization of embedding layer.

Open Marcovaldong opened this issue 6 years ago • 1 comments

I have a question about the weight intialization of embedding layer.

In source code of PyTorch, the weight of embedding layer is initialized by N(0, 1). In this code, the weight of embedding layer is intialized by uniform.

I trained the model two times with different initialization method, and I found that the default initialization makes the model converges too slow.

So, why the embedding is initialized by N(0, 1), which seems not a good start point?

Jul 18 '19 10:07 Marcovaldong

It was just probably an easy choice, if you think a line change would make this example converge faster by all means please make a PR

Mar 10 '22 00:03 msaroufim