Data Loader Error when trying to finetune Roberta

Open evelynkyl opened this issue 3 years ago • 1 comments

❓ Questions & Help

I'm trying to fine-tune a pretrained XLM-Roberta model using run-lm-fine-tuning and the following script. However, it returns RuntimeError: the stack expects each tensor to be equal size, but got [12] at entry 0 and [2] at entry 1. I used the same dataset as the readme on the example page. Then, I tried changing the model_type and model_name_or_path to roberta, but the same error occurs.

export TRAIN_FILE=/home/evelyn/usr/examples/wikitext-2-raw/wiki.train.raw
export TEST_FILE=/home/evelyn/usr/examples/wikitext-2-raw/wiki.test.raw

CUDA_VISIBLE_DEVICES=1 python3 run_lm_finetuning.py \
    --output_dir=output \
    --model_type=roberta \
    --model_name_or_path=roberta-base \
    --do_train \
    --train_data_file=$TRAIN_FILE \
    --do_eval \
    --eval_data_file=$TEST_FILE \
    --mlm

Below is the full output log:

02/28/2022 00:08:18 - WARNING - __main__ -   Process rank: -1, device: cpu, n_gpu: 0, distributed training: False, 16-bits training: False
02/28/2022 00:08:23 - INFO - __main__ -   Training/evaluation parameters Namespace(adam_epsilon=1e-08, block_size=510, cache_dir='', config_name='', device=device(type='cpu'), do_eval=True, do_lower_case=False, do_train=True, eval_all_checkpoints=False, eval_data_file='/home/evelyn/usr/examples/wikitext-2-raw/wiki.test.raw', evaluate_during_training=False, fp16=False, fp16_opt_level='O1', gradient_accumulation_steps=1, learning_rate=5e-05, local_rank=-1, logging_steps=50, max_grad_norm=1.0, max_steps=-1, mlm=True, mlm_probability=0.15, model_name_or_path='roberta-base', model_type='roberta', n_gpu=0, no_cuda=False, num_train_epochs=1.0, output_dir='output', overwrite_cache=False, overwrite_output_dir=False, per_gpu_eval_batch_size=4, per_gpu_train_batch_size=4, save_steps=50, save_total_limit=None, seed=42, server_ip='', server_port='', tokenizer_name='', train_data_file='/home/evelyn/usr/examples/wikitext-2-raw/wiki.train.raw', warmup_steps=0, weight_decay=0.0)
02/28/2022 00:08:23 - INFO - __main__ -   Loading features from cached file /home/evelyn/usr/examples/wikitext-2-raw/cached_lm_510_wiki.train.raw
/home/evelyn/miniconda3/envs/usr_eval/lib/python3.8/site-packages/transformers/optimization.py:306: FutureWarning: This implementation of AdamW is deprecated and will be removed in a future version. Use thePyTorch implementation torch.optim.AdamW instead, or set `no_deprecation_warning=True` to disable this warning
  warnings.warn(
02/28/2022 00:08:23 - INFO - __main__ -   ***** Running training *****
02/28/2022 00:08:23 - INFO - __main__ -     Num examples = 36718
02/28/2022 00:08:23 - INFO - __main__ -     Num Epochs = 1
02/28/2022 00:08:23 - INFO - __main__ -     Instantaneous batch size per GPU = 4
02/28/2022 00:08:23 - INFO - __main__ -     Total train batch size (w. parallel, distributed & accumulation) = 4
02/28/2022 00:08:23 - INFO - __main__ -     Gradient Accumulation steps = 1
02/28/2022 00:08:23 - INFO - __main__ -     Total optimization steps = 9180
Iteration:   0%|                                                                                            | 0/9180 [00:00<?, ?it/s]
Epoch:   0%|                                                                                                   | 0/1 [00:00<?, ?it/s]
Traceback (most recent call last):
  File "run_lm_finetuning.py", line 596, in <module>
    main()
  File "run_lm_finetuning.py", line 548, in main
    global_step, tr_loss = train(args, train_dataset, model, tokenizer)
  File "run_lm_finetuning.py", line 259, in train
    for step, batch in enumerate(epoch_iterator):
  File "/home/evelyn/miniconda3/envs/usr_eval/lib/python3.8/site-packages/tqdm-4.62.3-py3.8.egg/tqdm/std.py", line 1180, in __iter__
    for obj in iterable:
  File "/home/evelyn/miniconda3/envs/usr_eval/lib/python3.8/site-packages/torch/utils/data/dataloader.py", line 521, in __next__
    data = self._next_data()
  File "/home/evelyn/miniconda3/envs/usr_eval/lib/python3.8/site-packages/torch/utils/data/dataloader.py", line 561, in _next_data
    data = self._dataset_fetcher.fetch(index)  # may raise StopIteration
  File "/home/evelyn/miniconda3/envs/usr_eval/lib/python3.8/site-packages/torch/utils/data/_utils/fetch.py", line 52, in fetch
    return self.collate_fn(data)
  File "/home/evelyn/miniconda3/envs/usr_eval/lib/python3.8/site-packages/torch/utils/data/_utils/collate.py", line 56, in default_collate
    return torch.stack(batch, 0, out=out)
RuntimeError: stack expects each tensor to be equal size, but got [12] at entry 0 and [2] at entry 1

Any pointer on how to resolve this issue? Thanks so much!!

Feb 27 '22 23:02 evelynkyl

I've got the same issue. Have you solved this problem?

May 23 '23 13:05 Yerin-Hwang49