examples
examples copied to clipboard
A question about weight initialization of embedding layer.
I have a question about the weight intialization of embedding layer.
In source code of PyTorch, the weight of embedding layer is initialized by N(0, 1). In this code, the weight of embedding layer is intialized by uniform.
I trained the model two times with different initialization method, and I found that the default initialization makes the model converges too slow.
So, why the embedding is initialized by N(0, 1), which seems not a good start point?
It was just probably an easy choice, if you think a line change would make this example converge faster by all means please make a PR