transformers Cannot train language-modeling using Luke model

System Info

Copy-and-paste the text below in your GitHub issue and FILL OUT the two last points.

transformers version: 4.29.0.dev0
Platform: Linux-5.15.0-52-generic-x86_64-with-glibc2.10
Python version: 3.8.13
Huggingface_hub version: 0.14.0
Safetensors version: not installed
PyTorch version (GPU?): 1.12.0a0+bd13bc6 (True)
Tensorflow version (GPU?): not installed (NA)
Flax version (CPU?/GPU?/TPU?): not installed (NA)
Jax version: not installed
JaxLib version: not installed
Using GPU in script?: Yes
Using distributed or parallel set-up in script?: No

Who can help?

@Sgugger

Information

[X] The official example scripts
[ ] My own modified scripts

Tasks

[X] An officially supported task in the examples folder (such as GLUE/SQuAD, ...)
[ ] My own task or dataset (give details below)

Reproduction

I want to try to fine-tuning Luke model via run_mlm.py in example folder. I use the standard script in examples, then I use following code to start train: ` pip install git+https://github.com/huggingface/transformers

python /gxtq-ner-ws/run_mlm.py
--output_dir=/gxtq-ner-ws/luke_large_6_pretrained_v2/
--model_type=luke
--model_name_or_path=studio-ousia/luke-large-lite
--do_train
--per_device_train_batch_size 16
--num_train_epochs 6
--train_file=/gxtq-ner-ws/lm_training_data_v2.txt
--save_total_limit 1
--save_steps 10000
`

Then I got following error :

[INFO|trainer.py:1776] 2023-04-25 06:57:12,367 >> Number of trainable parameters = 147,342,943 0%| | 0/5814 [00:00<?, ?it/s]Traceback (most recent call last): File "./run_language_modeling_v4.py", line 657, in main() File "./run_language_modeling_v4.py", line 606, in main train_result = trainer.train(resume_from_checkpoint=checkpoint) File "/opt/conda/lib/python3.8/site-packages/transformers/trainer.py", line 1662, in train return inner_training_loop( File "/opt/conda/lib/python3.8/site-packages/transformers/trainer.py", line 1930, in _inner_training_loop tr_loss_step = self.training_step(model, inputs) File "/opt/conda/lib/python3.8/site-packages/transformers/trainer.py", line 2718, in training_step loss.backward() File "/opt/conda/lib/python3.8/site-packages/torch/_tensor.py", line 399, in backward torch.autograd.backward(self, gradient, retain_graph, create_graph, inputs=inputs) File "/opt/conda/lib/python3.8/site-packages/torch/autograd/init.py", line 173, in backward Variable._execution_engine.run_backward( # Calls into the C++ engine to run the backward pass RuntimeError: CUDA error: device-side assert triggered CUDA kernel errors might be asynchronously reported at some other API call,so the stacktrace below might be incorrect. For debugging consider passing CUDA_LAUNCH_BLOCKING=1. /opt/pytorch/pytorch/aten/src/ATen/native/cuda/Loss.cu:257: nll_loss_forward_reduce_cuda_kernel_2d: block: [0,0,0], thread: [0,0,0] Assertion t >= 0 && t < n_classes failed. /opt/pytorch/pytorch/aten/src/ATen/native/cuda/Loss.cu:257: nll_loss_forward_reduce_cuda_kernel_2d: block: [0,0,0], thread: [1,0,0] Assertion t >= 0 && t < n_classes failed. /opt/pytorch/pytorch/aten/src/ATen/native/cuda/Loss.cu:257: nll_loss_forward_reduce_cuda_kernel_2d: block: [0,0,0], thread: [2,0,0] Assertion t >= 0 && t < n_classes failed. /opt/pytorch/pytorch/aten/src/ATen/native/cuda/Loss.cu:257: nll_loss_forward_reduce_cuda_kernel_2d: block: [0,0,0], thread: [3,0,0] Assertion t >= 0 && t < n_classes failed. /opt/pytorch/pytorch/aten/src/ATen/native/cuda/Loss.cu:257: nll_loss_forward_reduce_cuda_kernel_2d: block: [0,0,0], thread: [4,0,0] Assertion t >= 0 && t < n_classes failed. /opt/pytorch/pytorch/aten/src/ATen/native/cuda/Loss.cu:257: nll_loss_forward_reduce_cuda_kernel_2d: block: [0,0,0], thread: [6,0,0] Assertion t >= 0 && t < n_classes failed. /opt/pytorch/pytorch/aten/src/ATen/native/cuda/Loss.cu:257: nll_loss_forward_reduce_cuda_kernel_2d: block: [0,0,0], thread: [7,0,0] Assertion t >= 0 && t < n_classes failed. /opt/pytorch/pytorch/aten/src/ATen/native/cuda/Loss.cu:257: nll_loss_forward_reduce_cuda_kernel_2d: block: [0,0,0], thread: [8,0,0] Assertion t >= 0 && t < n_classes failed. /opt/pytorch/pytorch/aten/src/ATen/native/cuda/Loss.cu:257: nll_loss_forward_reduce_cuda_kernel_2d: block: [0,0,0], thread: [9,0,0] Assertion t >= 0 && t < n_classes failed. /opt/pytorch/pytorch/aten/src/ATen/native/cuda/Loss.cu:257: nll_loss_forward_reduce_cuda_kernel_2d: block: [0,0,0], thread: [10,0,0] Assertion t >= 0 && t < n_classes failed. /opt/pytorch/pytorch/aten/src/ATen/native/cuda/Loss.cu:257: nll_loss_forward_reduce_cuda_kernel_2d: block: [0,0,0], thread: [11,0,0] Assertion t >= 0 && t < n_classes failed. /opt/pytorch/pytorch/aten/src/ATen/native/cuda/Loss.cu:257: nll_loss_forward_reduce_cuda_kernel_2d: block: [0,0,0], thread: [12,0,0] Assertion t >= 0 && t < n_classes failed. /opt/pytorch/pytorch/aten/src/ATen/native/cuda/Loss.cu:257: nll_loss_forward_reduce_cuda_kernel_2d: block: [0,0,0], thread: [13,0,0] Assertion t >= 0 && t < n_classes failed. /opt/pytorch/pytorch/aten/src/ATen/native/cuda/Loss.cu:257: nll_loss_forward_reduce_cuda_kernel_2d: block: [0,0,0], thread: [14,0,0] Assertion t >= 0 && t < n_classes failed. /opt/pytorch/pytorch/aten/src/ATen/native/cuda/Loss.cu:257: nll_loss_forward_reduce_cuda_kernel_2d: block: [0,0,0], thread: [15,0,0] Assertion t >= 0 && t < n_classes failed. /opt/pytorch/pytorch/aten/src/ATen/native/cuda/Loss.cu:257: nll_loss_forward_reduce_cuda_kernel_2d: block: [0,0,0], thread: [16,0,0] Assertion t >= 0 && t < n_classes failed. /opt/pytorch/pytorch/aten/src/ATen/native/cuda/Loss.cu:257: nll_loss_forward_reduce_cuda_kernel_2d: block: [0,0,0], thread: [17,0,0] Assertion t >= 0 && t < n_classes failed. /opt/pytorch/pytorch/aten/src/ATen/native/cuda/Loss.cu:257: nll_loss_forward_reduce_cuda_kernel_2d: block: [0,0,0], thread: [18,0,0] Assertion t >= 0 && t < n_classes failed. /opt/pytorch/pytorch/aten/src/ATen/native/cuda/Loss.cu:257: nll_loss_forward_reduce_cuda_kernel_2d: block: [0,0,0], thread: [19,0,0] Assertion t >= 0 && t < n_classes failed. /opt/pytorch/pytorch/aten/src/ATen/native/cuda/Loss.cu:257: nll_loss_forward_reduce_cuda_kernel_2d: block: [0,0,0], thread: [21,0,0] Assertion t >= 0 && t < n_classes failed. /opt/pytorch/pytorch/aten/src/ATen/native/cuda/Loss.cu:257: nll_loss_forward_reduce_cuda_kernel_2d: block: [0,0,0], thread: [22,0,0] Assertion t >= 0 && t < n_classes failed. /opt/pytorch/pytorch/aten/src/ATen/native/cuda/Loss.cu:257: nll_loss_forward_reduce_cuda_kernel_2d: block: [0,0,0], thread: [23,0,0] Assertion t >= 0 && t < n_classes failed. /opt/pytorch/pytorch/aten/src/ATen/native/cuda/Loss.cu:257: nll_loss_forward_reduce_cuda_kernel_2d: block: [0,0,0], thread: [24,0,0] Assertion t >= 0 && t < n_classes failed. /opt/pytorch/pytorch/aten/src/ATen/native/cuda/Loss.cu:257: nll_loss_forward_reduce_cuda_kernel_2d: block: [0,0,0], thread: [26,0,0] Assertion t >= 0 && t < n_classes failed. /opt/pytorch/pytorch/aten/src/ATen/native/cuda/Loss.cu:257: nll_loss_forward_reduce_cuda_kernel_2d: block: [0,0,0], thread: [28,0,0] Assertion t >= 0 && t < n_classes failed. /opt/pytorch/pytorch/aten/src/ATen/native/cuda/Loss.cu:257: nll_loss_forward_reduce_cuda_kernel_2d: block: [0,0,0], thread: [29,0,0] Assertion t >= 0 && t < n_classes failed. /opt/pytorch/pytorch/aten/src/ATen/native/cuda/Loss.cu:257: nll_loss_forward_reduce_cuda_kernel_2d: block: [0,0,0], thread: [30,0,0] Assertion t >= 0 && t < n_classes failed. /opt/pytorch/pytorch/aten/src/ATen/native/cuda/Loss.cu:257: nll_loss_forward_reduce_cuda_kernel_2d: block: [0,0,0], thread: [31,0,0] Assertion t >= 0 && t < n_classes failed. 0%| | 0/5814 [00:00<?, ?it/s]

I also tried to run it in cpu env. here is the error :

Traceback (most recent call last): File "./run_language_modeling_v4.py", line 657, in main() File "./run_language_modeling_v4.py", line 606, in main train_result = trainer.train(resume_from_checkpoint=checkpoint) File "/opt/conda/lib/python3.8/site-packages/transformers/trainer.py", line 1662, in train return inner_training_loop( File "/opt/conda/lib/python3.8/site-packages/transformers/trainer.py", line 1930, in _inner_training_loop tr_loss_step = self.training_step(model, inputs) File "/opt/conda/lib/python3.8/site-packages/transformers/trainer.py", line 2700, in training_step loss = self.compute_loss(model, inputs) File "/opt/conda/lib/python3.8/site-packages/transformers/trainer.py", line 2732, in compute_loss outputs = model(**inputs) File "/opt/conda/lib/python3.8/site-packages/torch/nn/modules/module.py", line 1111, in _call_impl return forward_call(*input, **kwargs) File "/opt/conda/lib/python3.8/site-packages/transformers/models/luke/modeling_luke.py", line 1375, in forward mlm_loss = self.loss_fn(logits.view(-1, self.config.vocab_size), labels.view(-1)) File "/opt/conda/lib/python3.8/site-packages/torch/nn/modules/module.py", line 1111, in _call_impl return forward_call(*input, **kwargs) File "/opt/conda/lib/python3.8/site-packages/torch/nn/modules/loss.py", line 1163, in forward return F.cross_entropy(input, target, weight=self.weight, File "/opt/conda/lib/python3.8/site-packages/torch/nn/functional.py", line 2961, in cross_entropy return torch._C._nn.cross_entropy_loss(input, target, weight, _Reduction.get_enum(reduction), ignore_index, label_smoothing) IndexError: Target -100 is out of bounds.

Expected behavior

train model as expected.

Apr 25 '23 08:04 doherty88

It looks like the Luke model is not compatible out of the box with those examples since the person who contributed it decided to use -1 as an index in the cross-entropy loss instead of -100 that we use everywhere else.

Might be worth fixing though it's a breaking change @amyeroberts @ArthurZucker what do you think?

In the meantime, a workaround is to replace the -100 used for padding labels in the example by -1 to use it with Luke.

Apr 25 '23 13:04 sgugger

@sgugger Yes, I'd agree, I think it's better to update to be in line with the rest of the library.

Apr 25 '23 13:04 amyeroberts

ntime, a workaround is to replace the -10

Thanks @sgugger for the information, however, I am new to NLP. could you please tell me where should I change to use this workaround?

Apr 26 '23 01:04 doherty88