Cannot train language-modeling using Luke model
System Info
Copy-and-paste the text below in your GitHub issue and FILL OUT the two last points.
-
transformersversion: 4.29.0.dev0 - Platform: Linux-5.15.0-52-generic-x86_64-with-glibc2.10
- Python version: 3.8.13
- Huggingface_hub version: 0.14.0
- Safetensors version: not installed
- PyTorch version (GPU?): 1.12.0a0+bd13bc6 (True)
- Tensorflow version (GPU?): not installed (NA)
- Flax version (CPU?/GPU?/TPU?): not installed (NA)
- Jax version: not installed
- JaxLib version: not installed
- Using GPU in script?: Yes
- Using distributed or parallel set-up in script?: No
Who can help?
@Sgugger
Information
- [X] The official example scripts
- [ ] My own modified scripts
Tasks
- [X] An officially supported task in the
examplesfolder (such as GLUE/SQuAD, ...) - [ ] My own task or dataset (give details below)
Reproduction
I want to try to fine-tuning Luke model via run_mlm.py in example folder. I use the standard script in examples, then I use following code to start train: ` pip install git+https://github.com/huggingface/transformers
python /gxtq-ner-ws/run_mlm.py
--output_dir=/gxtq-ner-ws/luke_large_6_pretrained_v2/
--model_type=luke
--model_name_or_path=studio-ousia/luke-large-lite
--do_train
--per_device_train_batch_size 16
--num_train_epochs 6
--train_file=/gxtq-ner-ws/lm_training_data_v2.txt
--save_total_limit 1
--save_steps 10000
`
Then I got following error :
[INFO|trainer.py:1776] 2023-04-25 06:57:12,367 >> Number of trainable parameters = 147,342,943
0%| | 0/5814 [00:00<?, ?it/s]Traceback (most recent call last):
File "./run_language_modeling_v4.py", line 657, in t >= 0 && t < n_classes failed.
/opt/pytorch/pytorch/aten/src/ATen/native/cuda/Loss.cu:257: nll_loss_forward_reduce_cuda_kernel_2d: block: [0,0,0], thread: [1,0,0] Assertion t >= 0 && t < n_classes failed.
/opt/pytorch/pytorch/aten/src/ATen/native/cuda/Loss.cu:257: nll_loss_forward_reduce_cuda_kernel_2d: block: [0,0,0], thread: [2,0,0] Assertion t >= 0 && t < n_classes failed.
/opt/pytorch/pytorch/aten/src/ATen/native/cuda/Loss.cu:257: nll_loss_forward_reduce_cuda_kernel_2d: block: [0,0,0], thread: [3,0,0] Assertion t >= 0 && t < n_classes failed.
/opt/pytorch/pytorch/aten/src/ATen/native/cuda/Loss.cu:257: nll_loss_forward_reduce_cuda_kernel_2d: block: [0,0,0], thread: [4,0,0] Assertion t >= 0 && t < n_classes failed.
/opt/pytorch/pytorch/aten/src/ATen/native/cuda/Loss.cu:257: nll_loss_forward_reduce_cuda_kernel_2d: block: [0,0,0], thread: [6,0,0] Assertion t >= 0 && t < n_classes failed.
/opt/pytorch/pytorch/aten/src/ATen/native/cuda/Loss.cu:257: nll_loss_forward_reduce_cuda_kernel_2d: block: [0,0,0], thread: [7,0,0] Assertion t >= 0 && t < n_classes failed.
/opt/pytorch/pytorch/aten/src/ATen/native/cuda/Loss.cu:257: nll_loss_forward_reduce_cuda_kernel_2d: block: [0,0,0], thread: [8,0,0] Assertion t >= 0 && t < n_classes failed.
/opt/pytorch/pytorch/aten/src/ATen/native/cuda/Loss.cu:257: nll_loss_forward_reduce_cuda_kernel_2d: block: [0,0,0], thread: [9,0,0] Assertion t >= 0 && t < n_classes failed.
/opt/pytorch/pytorch/aten/src/ATen/native/cuda/Loss.cu:257: nll_loss_forward_reduce_cuda_kernel_2d: block: [0,0,0], thread: [10,0,0] Assertion t >= 0 && t < n_classes failed.
/opt/pytorch/pytorch/aten/src/ATen/native/cuda/Loss.cu:257: nll_loss_forward_reduce_cuda_kernel_2d: block: [0,0,0], thread: [11,0,0] Assertion t >= 0 && t < n_classes failed.
/opt/pytorch/pytorch/aten/src/ATen/native/cuda/Loss.cu:257: nll_loss_forward_reduce_cuda_kernel_2d: block: [0,0,0], thread: [12,0,0] Assertion t >= 0 && t < n_classes failed.
/opt/pytorch/pytorch/aten/src/ATen/native/cuda/Loss.cu:257: nll_loss_forward_reduce_cuda_kernel_2d: block: [0,0,0], thread: [13,0,0] Assertion t >= 0 && t < n_classes failed.
/opt/pytorch/pytorch/aten/src/ATen/native/cuda/Loss.cu:257: nll_loss_forward_reduce_cuda_kernel_2d: block: [0,0,0], thread: [14,0,0] Assertion t >= 0 && t < n_classes failed.
/opt/pytorch/pytorch/aten/src/ATen/native/cuda/Loss.cu:257: nll_loss_forward_reduce_cuda_kernel_2d: block: [0,0,0], thread: [15,0,0] Assertion t >= 0 && t < n_classes failed.
/opt/pytorch/pytorch/aten/src/ATen/native/cuda/Loss.cu:257: nll_loss_forward_reduce_cuda_kernel_2d: block: [0,0,0], thread: [16,0,0] Assertion t >= 0 && t < n_classes failed.
/opt/pytorch/pytorch/aten/src/ATen/native/cuda/Loss.cu:257: nll_loss_forward_reduce_cuda_kernel_2d: block: [0,0,0], thread: [17,0,0] Assertion t >= 0 && t < n_classes failed.
/opt/pytorch/pytorch/aten/src/ATen/native/cuda/Loss.cu:257: nll_loss_forward_reduce_cuda_kernel_2d: block: [0,0,0], thread: [18,0,0] Assertion t >= 0 && t < n_classes failed.
/opt/pytorch/pytorch/aten/src/ATen/native/cuda/Loss.cu:257: nll_loss_forward_reduce_cuda_kernel_2d: block: [0,0,0], thread: [19,0,0] Assertion t >= 0 && t < n_classes failed.
/opt/pytorch/pytorch/aten/src/ATen/native/cuda/Loss.cu:257: nll_loss_forward_reduce_cuda_kernel_2d: block: [0,0,0], thread: [21,0,0] Assertion t >= 0 && t < n_classes failed.
/opt/pytorch/pytorch/aten/src/ATen/native/cuda/Loss.cu:257: nll_loss_forward_reduce_cuda_kernel_2d: block: [0,0,0], thread: [22,0,0] Assertion t >= 0 && t < n_classes failed.
/opt/pytorch/pytorch/aten/src/ATen/native/cuda/Loss.cu:257: nll_loss_forward_reduce_cuda_kernel_2d: block: [0,0,0], thread: [23,0,0] Assertion t >= 0 && t < n_classes failed.
/opt/pytorch/pytorch/aten/src/ATen/native/cuda/Loss.cu:257: nll_loss_forward_reduce_cuda_kernel_2d: block: [0,0,0], thread: [24,0,0] Assertion t >= 0 && t < n_classes failed.
/opt/pytorch/pytorch/aten/src/ATen/native/cuda/Loss.cu:257: nll_loss_forward_reduce_cuda_kernel_2d: block: [0,0,0], thread: [26,0,0] Assertion t >= 0 && t < n_classes failed.
/opt/pytorch/pytorch/aten/src/ATen/native/cuda/Loss.cu:257: nll_loss_forward_reduce_cuda_kernel_2d: block: [0,0,0], thread: [28,0,0] Assertion t >= 0 && t < n_classes failed.
/opt/pytorch/pytorch/aten/src/ATen/native/cuda/Loss.cu:257: nll_loss_forward_reduce_cuda_kernel_2d: block: [0,0,0], thread: [29,0,0] Assertion t >= 0 && t < n_classes failed.
/opt/pytorch/pytorch/aten/src/ATen/native/cuda/Loss.cu:257: nll_loss_forward_reduce_cuda_kernel_2d: block: [0,0,0], thread: [30,0,0] Assertion t >= 0 && t < n_classes failed.
/opt/pytorch/pytorch/aten/src/ATen/native/cuda/Loss.cu:257: nll_loss_forward_reduce_cuda_kernel_2d: block: [0,0,0], thread: [31,0,0] Assertion t >= 0 && t < n_classes failed.
0%| | 0/5814 [00:00<?, ?it/s]
I also tried to run it in cpu env. here is the error :
Traceback (most recent call last):
File "./run_language_modeling_v4.py", line 657, in
Expected behavior
train model as expected.
It looks like the Luke model is not compatible out of the box with those examples since the person who contributed it decided to use -1 as an index in the cross-entropy loss instead of -100 that we use everywhere else.
Might be worth fixing though it's a breaking change @amyeroberts @ArthurZucker what do you think?
In the meantime, a workaround is to replace the -100 used for padding labels in the example by -1 to use it with Luke.
@sgugger Yes, I'd agree, I think it's better to update to be in line with the rest of the library.
ntime, a workaround is to replace the -10
Thanks @sgugger for the information, however, I am new to NLP. could you please tell me where should I change to use this workaround?