seq2seq icon indicating copy to clipboard operation
seq2seq copied to clipboard

an illegal memory access was encountered

Open dingjibang opened this issue 7 years ago • 6 comments

项目下下来简单填了几个answer和question然后跑起来测试,发现可以运行并且效果还不错,就搞了将近2mb的answer和question,在preprocessing阶段通过,开始训练的时候就提示下面的错误了。

THCudaCheck FAIL file=C:/new-builder_3/win-wheel/pytorch/aten/src/ATen/native/cuda/Embedding.cu line=247 error=77 : an illegal memory access was encountered
Traceback (most recent call last):
File "seq2seq.py", line 436, in
seq.train()
File "seq2seq.py", line 210, in train
loss, logits = self.step(inputs, targets, self.max_length)
File "seq2seq.py", line 265, in step
loss.backward()
File "C:\Users\Administrator\AppData\Local\Programs\Python\Python35\lib\site-packages\torch\tensor.py", line 93, in backward
torch.autograd.backward(self, gradient, retain_graph, create_graph)
File "C:\Users\Administrator\AppData\Local\Programs\Python\Python35\lib\site-packages\torch\autograd_init_.py", line 90, in backward
allow_unreachable=True) # allow_unreachable flag
RuntimeError: cuda runtime error (77) : an illegal memory access was encountered at C:/new-builder_3/win-wheel/pytorch/aten/src/ATen/native/cuda/Embedding.cu:2 47

一脸懵逼,我该怎么办

dingjibang avatar Sep 26 '18 08:09 dingjibang

cuda9.0,系统win10,py3.5

dingjibang avatar Sep 26 '18 08:09 dingjibang

使用CUDA_LAUNCH_BLOCKING=1 python3 seq2seq.py train 可以看到更多信息

yanwii avatar Sep 26 '18 08:09 yanwii

现在的信息已经是blocking = 1时候的了 顺便贴上blocking = 0的时候的信息

THCudaCheck FAIL file=c:\new-builder_3\win-wheel\pytorch\aten\src\thc\THCReduceAll.cuh line=317 error=77 : an illegal memory access was encountered Traceback (most recent call last): File "seq2seq.py", line 436, in seq.train() File "seq2seq.py", line 210, in train loss, logits = self.step(inputs, targets, self.max_length) File "seq2seq.py", line 266, in step torch.nn.utils.clip_grad_norm(self.encoder.parameters(), clip) File "C:\Users\Administrator\AppData\Local\Programs\Python\Python35\lib\site-packages\torch\nn\utils\clip_grad.py", line 51, in clip_grad_norm return clip_grad_norm_(parameters, max_norm, norm_type) File "C:\Users\Administrator\AppData\Local\Programs\Python\Python35\lib\site-packages\torch\nn\utils\clip_grad.py", line 32, in clip_grad_norm_ param_norm = p.grad.data.norm(norm_type) RuntimeError: cuda runtime error (77) : an illegal memory access was encountered at c:\new-builder_3\win-wheel\pytorch\aten\src\thc\THCReduceAll.cuh:317

0或者1,报出来的信息都是一样的,没有更多或者更少,就是报错的行数不一样

dingjibang avatar Sep 26 '18 09:09 dingjibang

可否把数据发我一份?

yanwii avatar Sep 27 '18 10:09 yanwii

@dingjibang 这个问题你解决了吗?我也遇到了这样的问题. @yanwii 但是我关掉GPU后epoch跑到4000,报错确实段错误

liutianling avatar Jan 07 '19 03:01 liutianling

你好,请问这个问题你解决了吗,我也遇到了这样的问题

ailovejinx avatar Jul 10 '23 08:07 ailovejinx