Error while using -copy_attn function

Open itscassie opened this issue 6 years ago • 1 comments

when I was trying to train my model with -copy_attn (copy attention function) It occurs some errors in both condition of whether using gpu or not But I'm thinking the template setting does not conflict with this copy_attn function (also the coverage_attn works well ) so my command line looks like python3 train.py -data path/to/data -copy_attn and the error message looks like (with gpu)

Traceback (most recent call last):
  File "train.py", line 42, in <module>
    main(opt)
  File "train.py", line 28, in main
    single_main(opt)
  File "/media/yoshow/HDD/BiSET/Bi-selective Encoding/onmt/train_single.py", line 133, in main
    opt.valid_steps)
  File "/media/yoshow/HDD/BiSET/Bi-selective Encoding/onmt/trainer.py", line 172, in train
    report_stats)
  File "/media/yoshow/HDD/BiSET/Bi-selective Encoding/onmt/trainer.py", line 296, in _gradient_accumulation
    trunc_size, self.shard_size, normalization)
  File "/media/yoshow/HDD/BiSET/Bi-selective Encoding/onmt/utils/loss.py", line 145, in sharded_compute_loss
    loss, stats = self._compute_loss(batch, **shard)
  File "/media/yoshow/HDD/BiSET/Bi-selective Encoding/onmt/modules/copy_generator.py", line 193, in _compute_loss
    batch, self.tgt_vocab, batch.dataset.src_vocabs)
  File "/media/yoshow/HDD/BiSET/Bi-selective Encoding/onmt/inputters/text_dataset.py", line 119, in collapse_copy_scores
    print('index: %s'%index)
  File "/home/yoshow/venv/pytorch11/lib/python3.5/site-packages/torch/tensor.py", line 71, in __repr__
    return torch._tensor_str._str(self)
  File "/home/yoshow/venv/pytorch11/lib/python3.5/site-packages/torch/_tensor_str.py", line 286, in _str
    tensor_str = _tensor_str(self, indent)
  File "/home/yoshow/venv/pytorch11/lib/python3.5/site-packages/torch/_tensor_str.py", line 201, in _tensor_str
    formatter = _Formatter(get_summarized_data(self) if summarize else self)
  File "/home/yoshow/venv/pytorch11/lib/python3.5/site-packages/torch/_tensor_str.py", line 83, in __init__
    value_str = '{}'.format(value)
  File "/home/yoshow/venv/pytorch11/lib/python3.5/site-packages/torch/tensor.py", line 386, in __format__
    return self.item().__format__(format_spec)
RuntimeError: CUDA error: device-side assert triggered

and the error message without using gpu is like

/home/yoshow/venv/pytorch11/lib/python3.5/site-packages/torchtext/data/field.py:359: UserWarning: To copy construct from a tensor, it is recommended to use sourceTensor.clone().detach() or sourceTensor.clone().detach().requires_grad_(True), rather than torch.tensor(sourceTensor).
  var = torch.tensor(arr, dtype=self.dtype, device=device)
/home/yoshow/venv/pytorch11/lib/python3.5/site-packages/torch/nn/functional.py:1386: UserWarning: nn.functional.sigmoid is deprecated. Use torch.sigmoid instead.
  warnings.warn("nn.functional.sigmoid is deprecated. Use torch.sigmoid instead.")
/home/yoshow/venv/pytorch11/lib/python3.5/site-packages/torch/nn/functional.py:1374: UserWarning: nn.functional.tanh is deprecated. Use torch.tanh instead.
  warnings.warn("nn.functional.tanh is deprecated. Use torch.tanh instead.")
Traceback (most recent call last):
  File "train.py", line 42, in <module>
    main(opt)
  File "train.py", line 28, in main
    single_main(opt)
  File "/media/yoshow/HDD/BiSET/Bi-selective Encoding/onmt/train_single.py", line 133, in main
    opt.valid_steps)
  File "/media/yoshow/HDD/BiSET/Bi-selective Encoding/onmt/trainer.py", line 172, in train
    report_stats)
  File "/media/yoshow/HDD/BiSET/Bi-selective Encoding/onmt/trainer.py", line 296, in _gradient_accumulation
    trunc_size, self.shard_size, normalization)
  File "/media/yoshow/HDD/BiSET/Bi-selective Encoding/onmt/utils/loss.py", line 145, in sharded_compute_loss
    loss, stats = self._compute_loss(batch, **shard)
  File "/media/yoshow/HDD/BiSET/Bi-selective Encoding/onmt/modules/copy_generator.py", line 189, in _compute_loss
    loss = self.criterion(scores, align, target)
  File "/media/yoshow/HDD/BiSET/Bi-selective Encoding/onmt/modules/copy_generator.py", line 125, in __call__
    out = scores.gather(1, align.view(-1, 1) + self.offset).view(-1)
RuntimeError: Invalid index in gather at /pytorch/aten/src/TH/generic/THTensorEvenMoreMath.cpp:459

Aug 25 '19 10:08 itscassie

I'm sorry that I can't solve this I problem because I didn't use any other techniques except for the naive attention machanism in my paper. I think you can check the copy_generator.py and text_dataset.py in the error messages.

Aug 26 '19 07:08 InitialBug