Embedding layer
Thanks for adapting this layer @kumarutkarsh1248! I have a bunch of comments---let me know what you think. As implemented, does this work in the mlpack-onnx networks you have put it into?
Actually, I couldn’t test this layer on onnx-mlack because I need one more layer: positional encoding There is another way to test this embedding layer on onnx-mlpack, but I’d rather adapt the positional encoding layer and then test both together on onnx-mlpack.
Thanks for adapting this layer @kumarutkarsh1248! I have a bunch of comments---let me know what you think. As implemented, does this work in the mlpack-onnx networks you have put it into?
Actually, I couldn’t test this layer on onnx-mlack because I need one more layer: positional encoding There is another way to test this embedding layer on onnx-mlpack, but I’d rather adapt the positional encoding layer and then test both together on onnx-mlpack.
Sounds good. Let's just get the gradient test working here, and then we can merge it. If there are other issues later when you use it with onnx-mlpack, we can address those then. :+1:
I handled the review comments. @kumarutkarsh1248 do you have time to take a look at this and make sure that you agree with how I changed things? Especially the output shape of the Embedding layer---I tried to make sure it was the same shape as the multihead attention layer requires.
I handled the review comments. @kumarutkarsh1248 do you have time to take a look at this and make sure that you agree with how I changed things? Especially the output shape of the Embedding layer---I tried to make sure it was the same shape as the multihead attention layer requires.
Sorry for the slow response and thanks for doing my part of work
The output shape of embedding layer looks correct, this is exactly how the mlltiHeadAttention expects the input to be.
I want to test this with onnx-mlpack but don't have enough time right now. I'll test it next month and I think its better to have some test cases for ( embedding layer and multiHeadAttention layer ) combined as embeeding layer is almost always used just before multiHeadAttention.
I’ll handle that part next month.
Not a bad idea, I added a simple test case including MultiheadAttention in 9f6a13a906; @kumarutkarsh1248 let me know what you think.
Thats perfect