ValueError: Asking to pad but the tokenizer does not have a padding token. Please select a token to use as `pad_token` `(tokenizer.pad_token = tokenizer.eos_token e.g.)` or add a new pad token via `tokenizer.add_special_tokens({'pad_token': '[PAD]'})`.
Hello,
�[39m 2805 � � # Backward compatibility for 'truncation_strategy', 'pad_to_max_length'
�[39m
�[39m � 2806 � � padding_strategy, truncation_strategy, max_length, kwargs = self._get_padding_tr
�[39m
�[39m 2807 � � � padding=padding, �[39m
�[39m 2808 � � � truncation=truncation, �[39m
�[39m 2809 � � � max_length=max_length, �[39m
�[39m �[39m
�[39m /root/miniconda3/lib/python3.8/site-packages/transformers/tokenization_utils_base.py:2443 in
�[39m
�[39m _get_padding_truncation_strategies �[39m
�[39m �[39m
�[39m 2440 � � �[39m
�[39m 2441 � � # Test if we have a padding token �[39m
�[39m 2442 � � if padding_strategy != PaddingStrategy.DO_NOT_PAD and (not self.pad_token or sel
�[39m
�[39m � 2443 � � � raise ValueError( �[39m
�[39m 2444 � � � � "Asking to pad but the tokenizer does not have a padding token. " �[39m
�[39m 2445 � � � � "Please select a token to use as pad_token (tokenizer.pad_token = tok �[39m �[39m 2446 � � � � "or add a new pad token via tokenizer.add_special_tokens({'pad_token': �[39m
����������������������������������������������������������������������������������������������������
ValueError: Asking to pad but the tokenizer does not have a padding token. Please select a token to use as pad_token (tokenizer.pad_token = tokenizer.eos_token e.g.) or add a new pad token via
tokenizer.add_special_tokens({'pad_token': '[PAD]'}).
WARNING:torch.distributed.elastic.multiprocessing.api:Sending process 2750 closing signal SIGTERM
ERROR:torch.distributed.elastic.multiprocessing.api:failed (exitcode: 1) local_rank: 1 (pid: 2751) of binary: /root/miniconda3/bin/python
Traceback (most recent call last):
File "/root/miniconda3/bin/torchrun", line 8, in
sys.exit(main())
File "/root/miniconda3/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/init.py", line 346, in wrapper
return f(*args, **kwargs)
File "/root/miniconda3/lib/python3.8/site-packages/torch/distributed/run.py", line 794, in main
run(args)
File "/root/miniconda3/lib/python3.8/site-packages/torch/distributed/run.py", line 785, in run
elastic_launch(
File "/root/miniconda3/lib/python3.8/site-packages/torch/distributed/launcher/api.py", line 134, in call
return launch_agent(self._config, self._entrypoint, list(args))
File "/root/miniconda3/lib/python3.8/site-packages/torch/distributed/launcher/api.py", line 250, in launch_agent
raise ChildFailedError(
torch.distributed.elastic.multiprocessing.errors.ChildFailedError:
fastchat/train/train_mem.py FAILED
Failures: <NO_OTHER_FAILURES>
Root Cause (first observed failure): [0]: time : 2023-05-03_18:33:56 host : autodl-container-645911b4fa-6063f245 rank : 1 (local_rank: 1) exitcode : 1 (pid: 2751) error_file: <N/A> traceback : To enable traceback see: https://pytorch.org/docs/stable/elastic/errors.html
root@autodl-container-645911b4fa-6063f245:~/autodl-tmp/FastChat-main#
==========================================================
torchrun --nproc_per_node=2 --master_port=20001 fastchat/train/train_mem.py
--model_name_or_path ../llama-7b-hf
--data_path ./dummy.json
--bf16 True
--output_dir output
--num_train_epochs 3
--per_device_train_batch_size 2
--per_device_eval_batch_size 2
--gradient_accumulation_steps 4
--evaluation_strategy "no"
--save_strategy "steps"
--save_steps 1200
--save_total_limit 10
--learning_rate 2e-5
--weight_decay 0.
--warmup_ratio 0.03
--lr_scheduler_type "cosine"
--logging_steps 1
--fsdp "full_shard auto_wrap"
--fsdp_transformer_layer_cls_to_wrap 'LlamaDecoderLayer'
--tf32 True
--model_max_length 2048
--gradient_checkpointing True
--lazy_preprocess True
Thank you for helping!
Getting same issue
ValueError: Asking to pad but the tokenizer does not have a padding token. Please select a token to use as pad_token (tokenizer.pad_token = tokenizer.eos_token e.g.) or add a new pad token via
tokenizer.add_special_tokens({'pad_token': '[PAD]'}).
python3.9 -m torch.distributed.run --nproc_per_node=2 --master_port=20001 fastchat/train/train_mem.py \
--model_name_or_path /opt/navd/llama-13b-hf \
--data_path /opt/navd/training/training_clean.json \
--bf16 True \
--output_dir /opt/navd/output_13b \
--num_train_epochs 3 \
--per_device_train_batch_size 2 \
--per_device_eval_batch_size 2 \
--gradient_accumulation_steps 16 \
--evaluation_strategy "no" \
--save_strategy "steps" \
--save_steps 1200 \
--save_total_limit 10 \
--learning_rate 2e-5 \
--weight_decay 0. \
--warmup_ratio 0.03 \
--lr_scheduler_type "cosine" \
--logging_steps 1 \
--fsdp "full_shard auto_wrap" \
--fsdp_transformer_layer_cls_to_wrap 'LlamaDecoderLayer' \
--tf32 True \
--model_max_length 2048 \
--gradient_checkpointing True \
--lazy_preprocess True
duplication of https://github.com/lm-sys/FastChat/issues/466