question about position embedding.
Thanks for you code.But I have a question about the implementation of the position embedding. It seems like position endoding is randomly initialized and updated in the training just like tokens embedding. What confuses me is how does this ways works learn specific position information?
it is a little complex to explain this question, although it is my research field. i am not the author of the project by the way. this process somewhat like "to learn how the position's affection works, by watching what a role does the word in this specified position play in the different sentence, according to compute the hidden_dims trainable floats". i suffered a lot of problems during the process of the debug but finally it works. i will note them out dependently in another post. although i complained the troubles this project takes to me, but i still have to approbate author's job. thank you for the big help of your project.