Flashback_code icon indicating copy to clipboard operation
Flashback_code copied to clipboard

RuntimeError: one of the variables needed for gradient computation has been modified by an inplace operation

Open blldd opened this issue 5 years ago • 9 comments

HI, very lucky to study your repo, unfortunately, I encountered an error while running the code as follows:

parse with gowalla default settings use device: cpu Split.TRAIN load 7768 users with max_seq_count 72 batches: 345 Split.TEST load 7768 users with max_seq_count 18 batches: 76 Use flashback training. Use pytorch RNN implementation. Warning: Error detected in AddmmBackward. Traceback of forward call that caused the error: File "/home/dedong/pycharmProjects/traj_pred/Flashback_code/train.py", line 68, in loss, h = trainer.loss(x, t, s, y, y_t, y_s, h, active_users) File "/home/dedong/pycharmProjects/traj_pred/Flashback_code/trainer.py", line 50, in loss out, h = self.model(x, t, s, y_t, y_s, h, active_users) File "/home/dedong/anaconda3/lib/python3.7/site-packages/torch/nn/modules/module.py", line 550, in call result = self.forward(*input, **kwargs) File "/home/dedong/pycharmProjects/traj_pred/Flashback_code/network.py", line 70, in forward out, h = self.rnn(x_emb, h) File "/home/dedong/anaconda3/lib/python3.7/site-packages/torch/nn/modules/module.py", line 550, in call result = self.forward(*input, **kwargs) File "/home/dedong/anaconda3/lib/python3.7/site-packages/torch/nn/modules/rnn.py", line 228, in forward self.dropout, self.training, self.bidirectional, self.batch_first) (print_stack at /pytorch/torch/csrc/autograd/python_anomaly_mode.cpp:60) Traceback (most recent call last): File "/home/dedong/pycharmProjects/traj_pred/Flashback_code/train.py", line 69, in loss.backward(retain_graph=True) File "/home/dedong/anaconda3/lib/python3.7/site-packages/torch/tensor.py", line 198, in backward torch.autograd.backward(self, gradient, retain_graph, create_graph) File "/home/dedong/anaconda3/lib/python3.7/site-packages/torch/autograd/init.py", line 100, in backward allow_unreachable=True) # allow_unreachable flag RuntimeError: one of the variables needed for gradient computation has been modified by an inplace operation: [torch.FloatTensor [10, 10]], which is output 0 of TBackward, is at version 2; expected version 1 instead. Hint: the backtrace further above shows the operation that failed to compute its gradient. The variable in question was changed in there or anywhere later. Good luck!

Process finished with exit code 1


so, would you please help me to figure it out? thanks a lot!

blldd avatar Dec 10 '20 08:12 blldd

HI, very lucky to study your repo, unfortunately, I encountered an error while running the code as follows:

parse with gowalla default settings use device: cpu Split.TRAIN load 7768 users with max_seq_count 72 batches: 345 Split.TEST load 7768 users with max_seq_count 18 batches: 76 Use flashback training. Use pytorch RNN implementation. Warning: Error detected in AddmmBackward. Traceback of forward call that caused the error: File "/home/dedong/pycharmProjects/traj_pred/Flashback_code/train.py", line 68, in loss, h = trainer.loss(x, t, s, y, y_t, y_s, h, active_users) File "/home/dedong/pycharmProjects/traj_pred/Flashback_code/trainer.py", line 50, in loss out, h = self.model(x, t, s, y_t, y_s, h, active_users) File "/home/dedong/anaconda3/lib/python3.7/site-packages/torch/nn/modules/module.py", line 550, in call result = self.forward(*input, **kwargs) File "/home/dedong/pycharmProjects/traj_pred/Flashback_code/network.py", line 70, in forward out, h = self.rnn(x_emb, h) File "/home/dedong/anaconda3/lib/python3.7/site-packages/torch/nn/modules/module.py", line 550, in call result = self.forward(*input, **kwargs) File "/home/dedong/anaconda3/lib/python3.7/site-packages/torch/nn/modules/rnn.py", line 228, in forward self.dropout, self.training, self.bidirectional, self.batch_first) (print_stack at /pytorch/torch/csrc/autograd/python_anomaly_mode.cpp:60) Traceback (most recent call last): File "/home/dedong/pycharmProjects/traj_pred/Flashback_code/train.py", line 69, in loss.backward(retain_graph=True) File "/home/dedong/anaconda3/lib/python3.7/site-packages/torch/tensor.py", line 198, in backward torch.autograd.backward(self, gradient, retain_graph, create_graph) File "/home/dedong/anaconda3/lib/python3.7/site-packages/torch/autograd/init.py", line 100, in backward allow_unreachable=True) # allow_unreachable flag RuntimeError: one of the variables needed for gradient computation has been modified by an inplace operation: [torch.FloatTensor [10, 10]], which is output 0 of TBackward, is at version 2; expected version 1 instead. Hint: the backtrace further above shows the operation that failed to compute its gradient. The variable in question was changed in there or anywhere later. Good luck!

Process finished with exit code 1

so, would you please help me to figure it out? thanks a lot!

I also met this problem. Do you know how to solve it? QAQ

smelly-dog avatar Apr 24 '21 08:04 smelly-dog

I have also encountered the above problems. What's your version of pytorch? I guess it's probably a problem with the pytorch version.

ZQSong1997 avatar May 11 '21 04:05 ZQSong1997

Hi. I've been working on this paper, too. I also encountered this problem when running the source program. Have you solved it? Thank you very much for your reply.

WenTao936 avatar Nov 03 '21 16:11 WenTao936

I have also encountered the above problems. What's your version of pytorch? I guess it's probably a problem with the pytorch version.

Hi. I've been working on this paper, too. I also encountered this problem when running the source program. Have you solved it? Thank you very much for your reply.

WenTao936 avatar Nov 03 '21 16:11 WenTao936

HI, very lucky to study your repo, unfortunately, I encountered an error while running the code as follows:

parse with gowalla default settings use device: cpu Split.TRAIN load 7768 users with max_seq_count 72 batches: 345 Split.TEST load 7768 users with max_seq_count 18 batches: 76 Use flashback training. Use pytorch RNN implementation. Warning: Error detected in AddmmBackward. Traceback of forward call that caused the error: File "/home/dedong/pycharmProjects/traj_pred/Flashback_code/train.py", line 68, in loss, h = trainer.loss(x, t, s, y, y_t, y_s, h, active_users) File "/home/dedong/pycharmProjects/traj_pred/Flashback_code/trainer.py", line 50, in loss out, h = self.model(x, t, s, y_t, y_s, h, active_users) File "/home/dedong/anaconda3/lib/python3.7/site-packages/torch/nn/modules/module.py", line 550, in call result = self.forward(*input, **kwargs) File "/home/dedong/pycharmProjects/traj_pred/Flashback_code/network.py", line 70, in forward out, h = self.rnn(x_emb, h) File "/home/dedong/anaconda3/lib/python3.7/site-packages/torch/nn/modules/module.py", line 550, in call result = self.forward(*input, **kwargs) File "/home/dedong/anaconda3/lib/python3.7/site-packages/torch/nn/modules/rnn.py", line 228, in forward self.dropout, self.training, self.bidirectional, self.batch_first) (print_stack at /pytorch/torch/csrc/autograd/python_anomaly_mode.cpp:60) Traceback (most recent call last): File "/home/dedong/pycharmProjects/traj_pred/Flashback_code/train.py", line 69, in loss.backward(retain_graph=True) File "/home/dedong/anaconda3/lib/python3.7/site-packages/torch/tensor.py", line 198, in backward torch.autograd.backward(self, gradient, retain_graph, create_graph) File "/home/dedong/anaconda3/lib/python3.7/site-packages/torch/autograd/init.py", line 100, in backward allow_unreachable=True) # allow_unreachable flag RuntimeError: one of the variables needed for gradient computation has been modified by an inplace operation: [torch.FloatTensor [10, 10]], which is output 0 of TBackward, is at version 2; expected version 1 instead. Hint: the backtrace further above shows the operation that failed to compute its gradient. The variable in question was changed in there or anywhere later. Good luck! Process finished with exit code 1 so, would you please help me to figure it out? thanks a lot!

I also met this problem. Do you know how to solve it? QAQ

Hi. I've been working on this paper, too. I also encountered this problem when running the source program. Have you solved it? Thank you very much for your reply.

WenTao936 avatar Nov 03 '21 16:11 WenTao936

HI, very lucky to study your repo, unfortunately, I encountered an error while running the code as follows:

parse with gowalla default settings use device: cpu Split.TRAIN load 7768 users with max_seq_count 72 batches: 345 Split.TEST load 7768 users with max_seq_count 18 batches: 76 Use flashback training. Use pytorch RNN implementation. Warning: Error detected in AddmmBackward. Traceback of forward call that caused the error: File "/home/dedong/pycharmProjects/traj_pred/Flashback_code/train.py", line 68, in loss, h = trainer.loss(x, t, s, y, y_t, y_s, h, active_users) File "/home/dedong/pycharmProjects/traj_pred/Flashback_code/trainer.py", line 50, in loss out, h = self.model(x, t, s, y_t, y_s, h, active_users) File "/home/dedong/anaconda3/lib/python3.7/site-packages/torch/nn/modules/module.py", line 550, in call result = self.forward(*input, **kwargs) File "/home/dedong/pycharmProjects/traj_pred/Flashback_code/network.py", line 70, in forward out, h = self.rnn(x_emb, h) File "/home/dedong/anaconda3/lib/python3.7/site-packages/torch/nn/modules/module.py", line 550, in call result = self.forward(*input, **kwargs) File "/home/dedong/anaconda3/lib/python3.7/site-packages/torch/nn/modules/rnn.py", line 228, in forward self.dropout, self.training, self.bidirectional, self.batch_first) (print_stack at /pytorch/torch/csrc/autograd/python_anomaly_mode.cpp:60) Traceback (most recent call last): File "/home/dedong/pycharmProjects/traj_pred/Flashback_code/train.py", line 69, in loss.backward(retain_graph=True) File "/home/dedong/anaconda3/lib/python3.7/site-packages/torch/tensor.py", line 198, in backward torch.autograd.backward(self, gradient, retain_graph, create_graph) File "/home/dedong/anaconda3/lib/python3.7/site-packages/torch/autograd/init.py", line 100, in backward allow_unreachable=True) # allow_unreachable flag RuntimeError: one of the variables needed for gradient computation has been modified by an inplace operation: [torch.FloatTensor [10, 10]], which is output 0 of TBackward, is at version 2; expected version 1 instead. Hint: the backtrace further above shows the operation that failed to compute its gradient. The variable in question was changed in there or anywhere later. Good luck!

Process finished with exit code 1

so, would you please help me to figure it out? thanks a lot!

Hi. I've been working on this paper, too. I also encountered this problem when running the source program. Have you solved it? Thank you very much for your reply.

WenTao936 avatar Nov 03 '21 16:11 WenTao936

"loss, h = trainer.loss(x, t, s, y, y_t, y_s, h, active_users)" in train.py I try to delete the hidden state output "h", the code could run normally. My environment is python=3.8 pytorch=1.10. I guess that the author's environment is pytorch 0.3, the problem seems to occur even when pytorch version is 0.4. So you could try to run in pytorch version 0.3.

kevin-xuan avatar Nov 23 '21 12:11 kevin-xuan

@kevin-xuan You are right, but simply deleting hidden layer of RNN may cause network failure. This version problem can be resolved by adding a line in trainer.py line 50:

        out, h = self.model(x, t, s, y_t, y_s, h, active_users)
        h = h.data  # add the code here
        out = out.view(-1, self.loc_count)

vilsmeier avatar Jun 08 '22 13:06 vilsmeier

@vilsmeier I think maybe h = h.detach() is better

I dont want to create a new env for this,thanks a lot for your solvement I think Rnn in new torch version change Hidden in-place so this cant work

Uyn1x avatar Mar 14 '23 06:03 Uyn1x