dot multiplication go wrong in pytorch 0.2 in attention module
there is some problem regarding to the attn module
energy = self.attn(encoder_output) energy = hidden.dot(energy)
it seems dot function in pytorch 0,2 only support vector
I have the same issue. Here is the stack trace
---------------------------------------------------------------------------
RuntimeError Traceback (most recent call last)
<ipython-input-21-153451c5590c> in <module>()
9
10 # Run the train function
---> 11 loss = train(input_variable, target_variable, encoder, decoder, encoder_optimizer, decoder_optimizer, criterion)
12
13 # Keep track of loss
<ipython-input-17-9703d5331834> in train(input_variable, target_variable, encoder, decoder, encoder_optimizer, decoder_optimizer, criterion, max_length)
31 # Teacher forcing: Use the ground-truth target as the next input
32 for di in range(target_length):
---> 33 decoder_output, decoder_context, decoder_hidden, decoder_attention = decoder(decoder_input, decoder_context, decoder_hidden, encoder_outputs)
34 loss += criterion(decoder_output[0], target_variable[di])
35 decoder_input = target_variable[di] # Next target is next input
/usr/local/lib/python3.5/dist-packages/torch/nn/modules/module.py in __call__(self, *input, **kwargs)
222 for hook in self._forward_pre_hooks.values():
223 hook(self, input)
--> 224 result = self.forward(*input, **kwargs)
225 for hook in self._forward_hooks.values():
226 hook_result = hook(self, input, result)
<ipython-input-15-1e8710146be2> in forward(self, word_input, last_context, last_hidden, encoder_outputs)
30
31 # Calculate attention from current RNN state and all encoder outputs; apply to encoder outputs
---> 32 attn_weights = self.attn(rnn_output.squeeze(0), encoder_outputs)
33 context = attn_weights.bmm(encoder_outputs.transpose(0, 1)) # B x 1 x N
34
/usr/local/lib/python3.5/dist-packages/torch/nn/modules/module.py in __call__(self, *input, **kwargs)
222 for hook in self._forward_pre_hooks.values():
223 hook(self, input)
--> 224 result = self.forward(*input, **kwargs)
225 for hook in self._forward_hooks.values():
226 hook_result = hook(self, input, result)
<ipython-input-14-3c700c5b6bb1> in forward(self, hidden, encoder_outputs)
22 # Calculate energies for each encoder output
23 for i in range(seq_len):
---> 24 attn_energies[i] = self.score(hidden, encoder_outputs[i])
25
26 # Normalize energies to weights in range 0 to 1, resize to 1 x 1 x seq_len
<ipython-input-14-3c700c5b6bb1> in score(self, hidden, encoder_output)
35 elif self.method == 'general':
36 energy = self.attn(encoder_output)
---> 37 energy = hidden.dot(energy)
38 return energy
39
/usr/local/lib/python3.5/dist-packages/torch/autograd/variable.py in dot(self, other)
629
630 def dot(self, other):
--> 631 return Dot.apply(self, other)
632
633 def _addcop(self, op, args, inplace):
/usr/local/lib/python3.5/dist-packages/torch/autograd/_functions/blas.py in forward(ctx, vector1, vector2)
209 ctx.save_for_backward(vector1, vector2)
210 ctx.sizes = (vector1.size(), vector2.size())
--> 211 return vector1.new((vector1.dot(vector2),))
212
213 @staticmethod
RuntimeError: Expected argument self to have 1 dimension(s), but has 2 at /pytorch/torch/csrc/generic/TensorMethods.cpp:23020
torch=(0.2.0.post1)
you can try: energy = torch.squeeze(hidden).dot(torch.squeeze(energy))
You can also use mm() if you make sure both items have 2 dimensions
Thanks for this. It looks like PyTorch devteam removed implicit flattening of matrices for dot product, which is what causes this glitch. Here's the discussion:
https://github.com/pytorch/pytorch/issues/2313
@dhpollack can you explain mm() ?
The mm() function is normal matrix multiplication of 2d matrices. So if A is 5x3 and B is 3x5 then mm(A,B) is a 5x5 matrix and mm(B,A) is a 3x3 matrix.
But ultimately I think that bmm() should be used because it's the same thing but allows for batches. I reworked the example on my computer. I'll post a snippet tomorrow.
I'm wondering about speed, is there an easy way to invert one of the matrices? If it's two nx1 matrix, shouldn't that be a quick fix?
On Sun, Oct 22, 2017 at 5:41 PM, David Pollack [email protected] wrote:
The mm() function is normal matrix multiplication of 2d matrices. So if A is 5x3 and B is 3x5 then mm(A,B) is a 5x5 matrix and mm(B,A) is a 3x3 matrix.
But ultimately I think that bmm() should be used because it's the same thing but allows for batches. I reworked the example on my computer. I'll post a snippet tomorrow.
— You are receiving this because you commented. Reply to this email directly, view it on GitHub https://github.com/spro/practical-pytorch/issues/51#issuecomment-338511447, or mute the thread https://github.com/notifications/unsubscribe-auth/AHxGKTe-SB32IMCHxH6SbUu9L_l1hfnvks5su7aRgaJpZM4O5_Km .
yes, that is an easy fix, but it's more efficient to avoid for loops. I was playing with this code for a different application, but you can see below how I don't have for loops in the Attn class
https://gist.github.com/dhpollack/c4162aa9d29eec20df8c77d9273b651b#file-pytorch_attention_audio-py-L314
thanks so much for this snippet, super helpful to see.
so, just to be clear, your method reshapes the matrix, then conducts batch matrix multiplication, correct? that's what I previously meant by invert--should have said transpose!
On Mon, Oct 23, 2017 at 7:01 AM, David Pollack [email protected] wrote:
yes, that is an easy fix, but it's more efficient to avoid for loops. I was playing with this code for a different application, but you can see below how I don't have for loops in the Attn class
https://gist.github.com/dhpollack/c4162aa9d29eec20df8c77d9273b65 1b#file-pytorch_attention_audio-py-L314
— You are receiving this because you commented. Reply to this email directly, view it on GitHub https://github.com/spro/practical-pytorch/issues/51#issuecomment-338623728, or mute the thread https://github.com/notifications/unsubscribe-auth/AHxGKRhT-t2SQdc-ssYzu3BCwQ5GB-D1ks5svHITgaJpZM4O5_Km .
yes, you could transpose a Nx1 vector and then use the for loop.
@dhpollack
Last question, promise: is there a major difference between your solution and using torch.squeeze(vector1).dot(torch.squeeze(vector2) ?
I was attempting to do all the multiplications in one shot and trying to avoid using squeeze/unsqueeze/view/cat operations as much as possible. I think avoiding those should make things faster but I haven't tested it.