practical-pytorch dot multiplication go wrong in pytorch 0.2 in attention module

there is some problem regarding to the attn module

energy = self.attn(encoder_output) energy = hidden.dot(energy)

it seems dot function in pytorch 0,2 only support vector

Aug 17 '17 09:08 SB233

I have the same issue. Here is the stack trace

---------------------------------------------------------------------------
RuntimeError                              Traceback (most recent call last)
<ipython-input-21-153451c5590c> in <module>()
      9 
     10     # Run the train function
---> 11     loss = train(input_variable, target_variable, encoder, decoder, encoder_optimizer, decoder_optimizer, criterion)
     12 
     13     # Keep track of loss

<ipython-input-17-9703d5331834> in train(input_variable, target_variable, encoder, decoder, encoder_optimizer, decoder_optimizer, criterion, max_length)
     31         # Teacher forcing: Use the ground-truth target as the next input
     32         for di in range(target_length):
---> 33             decoder_output, decoder_context, decoder_hidden, decoder_attention = decoder(decoder_input, decoder_context, decoder_hidden, encoder_outputs)
     34             loss += criterion(decoder_output[0], target_variable[di])
     35             decoder_input = target_variable[di] # Next target is next input

/usr/local/lib/python3.5/dist-packages/torch/nn/modules/module.py in __call__(self, *input, **kwargs)
    222         for hook in self._forward_pre_hooks.values():
    223             hook(self, input)
--> 224         result = self.forward(*input, **kwargs)
    225         for hook in self._forward_hooks.values():
    226             hook_result = hook(self, input, result)

<ipython-input-15-1e8710146be2> in forward(self, word_input, last_context, last_hidden, encoder_outputs)
     30 
     31         # Calculate attention from current RNN state and all encoder outputs; apply to encoder outputs
---> 32         attn_weights = self.attn(rnn_output.squeeze(0), encoder_outputs)
     33         context = attn_weights.bmm(encoder_outputs.transpose(0, 1)) # B x 1 x N
     34 

/usr/local/lib/python3.5/dist-packages/torch/nn/modules/module.py in __call__(self, *input, **kwargs)
    222         for hook in self._forward_pre_hooks.values():
    223             hook(self, input)
--> 224         result = self.forward(*input, **kwargs)
    225         for hook in self._forward_hooks.values():
    226             hook_result = hook(self, input, result)

<ipython-input-14-3c700c5b6bb1> in forward(self, hidden, encoder_outputs)
     22         # Calculate energies for each encoder output
     23         for i in range(seq_len):
---> 24             attn_energies[i] = self.score(hidden, encoder_outputs[i])
     25 
     26         # Normalize energies to weights in range 0 to 1, resize to 1 x 1 x seq_len

<ipython-input-14-3c700c5b6bb1> in score(self, hidden, encoder_output)
     35         elif self.method == 'general':
     36             energy = self.attn(encoder_output)
---> 37             energy = hidden.dot(energy)
     38             return energy
     39 

/usr/local/lib/python3.5/dist-packages/torch/autograd/variable.py in dot(self, other)
    629 
    630     def dot(self, other):
--> 631         return Dot.apply(self, other)
    632 
    633     def _addcop(self, op, args, inplace):

/usr/local/lib/python3.5/dist-packages/torch/autograd/_functions/blas.py in forward(ctx, vector1, vector2)
    209         ctx.save_for_backward(vector1, vector2)
    210         ctx.sizes = (vector1.size(), vector2.size())
--> 211         return vector1.new((vector1.dot(vector2),))
    212 
    213     @staticmethod

RuntimeError: Expected argument self to have 1 dimension(s), but has 2 at /pytorch/torch/csrc/generic/TensorMethods.cpp:23020

torch=(0.2.0.post1)

Aug 26 '17 10:08 RobRomijnders

you can try: energy = torch.squeeze(hidden).dot(torch.squeeze(energy))

Sep 19 '17 07:09 SeanLee97

You can also use mm() if you make sure both items have 2 dimensions

Sep 28 '17 05:09 dhpollack

Thanks for this. It looks like PyTorch devteam removed implicit flattening of matrices for dot product, which is what causes this glitch. Here's the discussion:

https://github.com/pytorch/pytorch/issues/2313

Oct 22 '17 21:10 cooganb

@dhpollack can you explain mm() ?

Oct 22 '17 21:10 cooganb

The mm() function is normal matrix multiplication of 2d matrices. So if A is 5x3 and B is 3x5 then mm(A,B) is a 5x5 matrix and mm(B,A) is a 3x3 matrix.

But ultimately I think that bmm() should be used because it's the same thing but allows for batches. I reworked the example on my computer. I'll post a snippet tomorrow.

Oct 22 '17 21:10 dhpollack

I'm wondering about speed, is there an easy way to invert one of the matrices? If it's two nx1 matrix, shouldn't that be a quick fix?

On Sun, Oct 22, 2017 at 5:41 PM, David Pollack [email protected] wrote:

The mm() function is normal matrix multiplication of 2d matrices. So if A is 5x3 and B is 3x5 then mm(A,B) is a 5x5 matrix and mm(B,A) is a 3x3 matrix.

But ultimately I think that bmm() should be used because it's the same thing but allows for batches. I reworked the example on my computer. I'll post a snippet tomorrow.

— You are receiving this because you commented. Reply to this email directly, view it on GitHub https://github.com/spro/practical-pytorch/issues/51#issuecomment-338511447, or mute the thread https://github.com/notifications/unsubscribe-auth/AHxGKTe-SB32IMCHxH6SbUu9L_l1hfnvks5su7aRgaJpZM4O5_Km .

Oct 23 '17 00:10 cooganb

yes, that is an easy fix, but it's more efficient to avoid for loops. I was playing with this code for a different application, but you can see below how I don't have for loops in the Attn class

https://gist.github.com/dhpollack/c4162aa9d29eec20df8c77d9273b651b#file-pytorch_attention_audio-py-L314

Oct 23 '17 11:10 dhpollack

thanks so much for this snippet, super helpful to see.

so, just to be clear, your method reshapes the matrix, then conducts batch matrix multiplication, correct? that's what I previously meant by invert--should have said transpose!

On Mon, Oct 23, 2017 at 7:01 AM, David Pollack [email protected] wrote:

yes, that is an easy fix, but it's more efficient to avoid for loops. I was playing with this code for a different application, but you can see below how I don't have for loops in the Attn class

https://gist.github.com/dhpollack/c4162aa9d29eec20df8c77d9273b65 1b#file-pytorch_attention_audio-py-L314

— You are receiving this because you commented. Reply to this email directly, view it on GitHub https://github.com/spro/practical-pytorch/issues/51#issuecomment-338623728, or mute the thread https://github.com/notifications/unsubscribe-auth/AHxGKRhT-t2SQdc-ssYzu3BCwQ5GB-D1ks5svHITgaJpZM4O5_Km .

Oct 23 '17 11:10 cooganb

yes, you could transpose a Nx1 vector and then use the for loop.

Oct 23 '17 13:10 dhpollack

@dhpollack

Last question, promise: is there a major difference between your solution and using torch.squeeze(vector1).dot(torch.squeeze(vector2) ?

Oct 23 '17 13:10 cooganb

I was attempting to do all the multiplications in one shot and trying to avoid using squeeze/unsqueeze/view/cat operations as much as possible. I think avoiding those should make things faster but I haven't tested it.

Oct 23 '17 13:10 dhpollack