[BUG]: TitleοΌError when running train_reward_model.py with Facebook/OPT-350M model
π Describe the bug
DescriptionοΌ
I am running the train_reward_model.py script with the Facebook/OPT-350M model using the following command. How to deal with it, thank you for your kindness:
bash: python train_reward_model.py --pretrain facebook/opt-350m
However, I encountered the following error message:
/root/.local/lib/python3.9/site-packages/torch/library.py:130: UserWarning: Overriding a previously registered kernel for the same operator and the same dispatch key
operator: aten::index.Tensor(Tensor self, Tensor?[] indices) -> Tensor
registered at aten/src/ATen/RegisterSchema.cpp:6
dispatch key: Meta
previous kernel: registered at ../aten/src/ATen/functorch/BatchRulesScatterOps.cpp:1053
new kernel: registered at /dev/null:228 (Triggered internally at ../aten/src/ATen/core/dispatch/OperatorEntry.cpp:150.)
self.m.impl(name, dispatch_key, fn)
Downloading (β¦)okenizer_config.json: 100%|ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ| 685/685 [00:00<00:00, 58.7kB/s]
Downloading (β¦)cial_tokens_map.json: 100%|ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ| 441/441 [00:00<00:00, 34.4kB/s]
Downloading (β¦)lve/main/config.json: 100%|ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ| 644/644 [00:00<00:00, 33.0kB/s]
The tokenizer class you load from this checkpoint is not the same type as the class this function is called from. It may result in unexpected tokenization.
The tokenizer class you load from this checkpoint is 'GPT2Tokenizer'.
The class this function is called from is 'BloomTokenizerFast'.
Traceback (most recent call last):
File "/root/ColossalAI/applications/ChatGPT/examples/train_reward_model.py", line 78, in tokenizers library serialization file,
(2) a slow tokenizer instance to convert or
(3) an equivalent slow tokenizer class to instantiate and convert.
You need to have sentencepiece installed to convert a slow
Environment
No response
Hi, it is temporarily only supporting BLOOM model. Please use bigscience/bloom-560m instead.
Thank u for response so fast. I saw it today and deal with it. But when I run bloom-560m on V100 32G, got out of memory. How many GPU needed for training 560m/10B/100B parameters model, Thank u
python train_reward_model.py --pretrain bigscience/bloom-560m
/root/.local/lib/python3.8/site-packages/torch/library.py:130: UserWarning: Overriding a previously registered kernel for the same operator and the same dispatch key
operator: aten::index.Tensor(Tensor self, Tensor?[] indices) -> Tensor
registered at aten/src/ATen/RegisterSchema.cpp:6
dispatch key: Meta
previous kernel: registered at ../aten/src/ATen/functorch/BatchRulesScatterOps.cpp:1053
new kernel: registered at /dev/null:219 (Triggered internally at ../aten/src/ATen/core/dispatch/OperatorEntry.cpp:150.)
self.m.impl(name, dispatch_key, fn)
Using custom data configuration Dahoas--rm-static-fd463b68d54124af
Found cached dataset parquet (/root/.cache/huggingface/datasets/Dahoas___parquet/Dahoas--rm-static-fd463b68d54124af/0.0.0/2a3b91fbd88a2c90d1dbbb32b460cf621d31bd5b05b934492fdef7d8d6f236ec)
100%|ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ| 2/2 [00:00<00:00, 3.33it/s]
100%|βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ| 100/100 [00:00<00:00, 465.82it/s]
100%|βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ| 5/5 [00:00<00:00, 464.92it/s]
Train step of epoch 0: 100%|ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ| 100/100 [01:02<00:00, 1.60it/s, loss=0.533, dist_mean=-.026]
Train step of epoch 0: 100%|ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ| 100/100 [01:02<00:00, 1.79it/s, loss=0.533, dist_mean=-.026Traceback (most recent call last):βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ| 100/100 [00:55<00:00, 1.79it/s, loss=0.0477]
File "train_reward_model.py", line 90, in
I have the same problem