ValueError: Asking to pad but the tokenizer does not have a padding token. Please select a token to use as `pad_token` `(tokenizer.pad_token = tokenizer.eos_token e.g.)` or add a new pad token via `tokenizer.add_special_tokens({'pad_token': '[PAD]'})`.

Open fucksmile opened this issue 2 years ago • 1 comments

Hello,

�[39m 2805 � � # Backward compatibility for 'truncation_strategy', 'pad_to_max_length'
�[39m �[39m � 2806 � � padding_strategy, truncation_strategy, max_length, kwargs = self._get_padding_tr
�[39m �[39m 2807 � � � padding=padding, �[39m �[39m 2808 � � � truncation=truncation, �[39m �[39m 2809 � � � max_length=max_length, �[39m �[39m �[39m �[39m /root/miniconda3/lib/python3.8/site-packages/transformers/tokenization_utils_base.py:2443 in
�[39m �[39m _get_padding_truncation_strategies �[39m �[39m �[39m �[39m 2440 � � �[39m �[39m 2441 � � # Test if we have a padding token �[39m �[39m 2442 � � if padding_strategy != PaddingStrategy.DO_NOT_PAD and (not self.pad_token or sel
�[39m �[39m � 2443 � � � raise ValueError( �[39m �[39m 2444 � � � � "Asking to pad but the tokenizer does not have a padding token. " �[39m �[39m 2445 � � � � "Please select a token to use as `pad_token` `(tokenizer.pad_token = tok �[39m �[39m 2446 � � � � "or add a new pad token via` tokenizer.add_special_tokens({'pad_token': �[39m �� ValueError: Asking to pad but the tokenizer does not have a padding token. Please select a token to use as `pad_token` `(tokenizer.pad_token = tokenizer.eos_token e.g.)` or add a new pad token via `tokenizer.add_special_tokens({'pad_token': '[PAD]'})`. WARNING:torch.distributed.elastic.multiprocessing.api:Sending process 2750 closing signal SIGTERM ERROR:torch.distributed.elastic.multiprocessing.api:failed (exitcode: 1) local_rank: 1 (pid: 2751) of binary: /root/miniconda3/bin/python Traceback (most recent call last): File "/root/miniconda3/bin/torchrun", line 8, in sys.exit(main()) File "/root/miniconda3/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/init.py", line 346, in wrapper return f(*args, kwargs) File "/root/miniconda3/lib/python3.8/site-packages/torch/distributed/run.py", line 794, in main run(args) File "/root/miniconda3/lib/python3.8/site-packages/torch/distributed/run.py", line 785, in run elastic_launch( File "/root/miniconda3/lib/python3.8/site-packages/torch/distributed/launcher/api.py", line 134, in call** return launch_agent(self._config, self._entrypoint, list(args)) File "/root/miniconda3/lib/python3.8/site-packages/torch/distributed/launcher/api.py", line 250, in launch_agent raise ChildFailedError( torch.distributed.elastic.multiprocessing.errors.ChildFailedError:

fastchat/train/train_mem.py FAILED

Failures: <NO_OTHER_FAILURES>

Root Cause (first observed failure): [0]: time : 2023-05-03_18:33:56 host : autodl-container-645911b4fa-6063f245 rank : 1 (local_rank: 1) exitcode : 1 (pid: 2751) error_file: <N/A> traceback : To enable traceback see: https://pytorch.org/docs/stable/elastic/errors.html

root@autodl-container-645911b4fa-6063f245:~/autodl-tmp/FastChat-main#

========================================================== torchrun --nproc_per_node=2 --master_port=20001 fastchat/train/train_mem.py
--model_name_or_path ../llama-7b-hf
--data_path ./dummy.json
--bf16 True
--output_dir output
--num_train_epochs 3
--per_device_train_batch_size 2
--per_device_eval_batch_size 2
--gradient_accumulation_steps 4
--evaluation_strategy "no"
--save_strategy "steps"
--save_steps 1200
--save_total_limit 10
--learning_rate 2e-5
--weight_decay 0.
--warmup_ratio 0.03
--lr_scheduler_type "cosine"
--logging_steps 1
--fsdp "full_shard auto_wrap"
--fsdp_transformer_layer_cls_to_wrap 'LlamaDecoderLayer'
--tf32 True
--model_max_length 2048
--gradient_checkpointing True
--lazy_preprocess True

Thank you for helping!

May 03 '23 10:05 fucksmile

Getting same issue

ValueError: Asking to pad but the tokenizer does not have a padding token. Please select a token to use as pad_token (tokenizer.pad_token = tokenizer.eos_token e.g.) or add a new pad token via tokenizer.add_special_tokens({'pad_token': '[PAD]'}).

python3.9 -m torch.distributed.run --nproc_per_node=2 --master_port=20001 fastchat/train/train_mem.py \
    --model_name_or_path /opt/navd/llama-13b-hf \
    --data_path /opt/navd/training/training_clean.json \
    --bf16 True \
    --output_dir /opt/navd/output_13b \
    --num_train_epochs 3 \
    --per_device_train_batch_size 2 \
    --per_device_eval_batch_size 2 \
    --gradient_accumulation_steps 16 \
    --evaluation_strategy "no" \
    --save_strategy "steps" \
    --save_steps 1200 \
    --save_total_limit 10 \
    --learning_rate 2e-5 \
    --weight_decay 0. \
    --warmup_ratio 0.03 \
    --lr_scheduler_type "cosine" \
    --logging_steps 1 \
    --fsdp "full_shard auto_wrap" \
    --fsdp_transformer_layer_cls_to_wrap 'LlamaDecoderLayer' \
    --tf32 True \
    --model_max_length 2048 \
    --gradient_checkpointing True \
    --lazy_preprocess True

May 04 '23 05:05 Sparetavns

duplication of https://github.com/lm-sys/FastChat/issues/466

May 08 '23 08:05 merrymercy