usr
usr copied to clipboard
Data Loader Error when trying to finetune Roberta
❓ Questions & Help
I'm trying to fine-tune a pretrained XLM-Roberta model using run-lm-fine-tuning and the following script. However, it returns RuntimeError: the stack expects each tensor to be equal size, but got [12] at entry 0 and [2] at entry 1. I used the same dataset as the readme on the example page. Then, I tried changing the model_type and model_name_or_path to roberta, but the same error occurs.
export TRAIN_FILE=/home/evelyn/usr/examples/wikitext-2-raw/wiki.train.raw
export TEST_FILE=/home/evelyn/usr/examples/wikitext-2-raw/wiki.test.raw
CUDA_VISIBLE_DEVICES=1 python3 run_lm_finetuning.py \
--output_dir=output \
--model_type=roberta \
--model_name_or_path=roberta-base \
--do_train \
--train_data_file=$TRAIN_FILE \
--do_eval \
--eval_data_file=$TEST_FILE \
--mlm
Below is the full output log:
02/28/2022 00:08:18 - WARNING - __main__ - Process rank: -1, device: cpu, n_gpu: 0, distributed training: False, 16-bits training: False
02/28/2022 00:08:23 - INFO - __main__ - Training/evaluation parameters Namespace(adam_epsilon=1e-08, block_size=510, cache_dir='', config_name='', device=device(type='cpu'), do_eval=True, do_lower_case=False, do_train=True, eval_all_checkpoints=False, eval_data_file='/home/evelyn/usr/examples/wikitext-2-raw/wiki.test.raw', evaluate_during_training=False, fp16=False, fp16_opt_level='O1', gradient_accumulation_steps=1, learning_rate=5e-05, local_rank=-1, logging_steps=50, max_grad_norm=1.0, max_steps=-1, mlm=True, mlm_probability=0.15, model_name_or_path='roberta-base', model_type='roberta', n_gpu=0, no_cuda=False, num_train_epochs=1.0, output_dir='output', overwrite_cache=False, overwrite_output_dir=False, per_gpu_eval_batch_size=4, per_gpu_train_batch_size=4, save_steps=50, save_total_limit=None, seed=42, server_ip='', server_port='', tokenizer_name='', train_data_file='/home/evelyn/usr/examples/wikitext-2-raw/wiki.train.raw', warmup_steps=0, weight_decay=0.0)
02/28/2022 00:08:23 - INFO - __main__ - Loading features from cached file /home/evelyn/usr/examples/wikitext-2-raw/cached_lm_510_wiki.train.raw
/home/evelyn/miniconda3/envs/usr_eval/lib/python3.8/site-packages/transformers/optimization.py:306: FutureWarning: This implementation of AdamW is deprecated and will be removed in a future version. Use thePyTorch implementation torch.optim.AdamW instead, or set `no_deprecation_warning=True` to disable this warning
warnings.warn(
02/28/2022 00:08:23 - INFO - __main__ - ***** Running training *****
02/28/2022 00:08:23 - INFO - __main__ - Num examples = 36718
02/28/2022 00:08:23 - INFO - __main__ - Num Epochs = 1
02/28/2022 00:08:23 - INFO - __main__ - Instantaneous batch size per GPU = 4
02/28/2022 00:08:23 - INFO - __main__ - Total train batch size (w. parallel, distributed & accumulation) = 4
02/28/2022 00:08:23 - INFO - __main__ - Gradient Accumulation steps = 1
02/28/2022 00:08:23 - INFO - __main__ - Total optimization steps = 9180
Iteration: 0%| | 0/9180 [00:00<?, ?it/s]
Epoch: 0%| | 0/1 [00:00<?, ?it/s]
Traceback (most recent call last):
File "run_lm_finetuning.py", line 596, in <module>
main()
File "run_lm_finetuning.py", line 548, in main
global_step, tr_loss = train(args, train_dataset, model, tokenizer)
File "run_lm_finetuning.py", line 259, in train
for step, batch in enumerate(epoch_iterator):
File "/home/evelyn/miniconda3/envs/usr_eval/lib/python3.8/site-packages/tqdm-4.62.3-py3.8.egg/tqdm/std.py", line 1180, in __iter__
for obj in iterable:
File "/home/evelyn/miniconda3/envs/usr_eval/lib/python3.8/site-packages/torch/utils/data/dataloader.py", line 521, in __next__
data = self._next_data()
File "/home/evelyn/miniconda3/envs/usr_eval/lib/python3.8/site-packages/torch/utils/data/dataloader.py", line 561, in _next_data
data = self._dataset_fetcher.fetch(index) # may raise StopIteration
File "/home/evelyn/miniconda3/envs/usr_eval/lib/python3.8/site-packages/torch/utils/data/_utils/fetch.py", line 52, in fetch
return self.collate_fn(data)
File "/home/evelyn/miniconda3/envs/usr_eval/lib/python3.8/site-packages/torch/utils/data/_utils/collate.py", line 56, in default_collate
return torch.stack(batch, 0, out=out)
RuntimeError: stack expects each tensor to be equal size, but got [12] at entry 0 and [2] at entry 1
Any pointer on how to resolve this issue? Thanks so much!!
I've got the same issue. Have you solved this problem?