ColossalAI [BUG]: Repository Not Found for url: https://huggingface.co/pretrain/resolve/main/tokenizer

🐛 Describe the bug

when i run ChatGPT: $ python train_reward_model.py --pretrain pretrain

get :

/home//anaconda3/envs/s20230216e310gpt/lib/python3.10/site-packages/torch/library.py:130: UserWarning: Overriding a previously registered kernel for the same operator and the same dispatch key operator: aten::index.Tensor(Tensor self, Tensor?[] indices) -> Tensor registered at /opt/conda/conda-bld/pytorch_1670525541990/work/build/aten/src/ATen/RegisterSchema.cpp:6 dispatch key: Meta previous kernel: registered at /opt/conda/conda-bld/pytorch_1670525541990/work/aten/src/ATen/functorch/BatchRulesScatterOps.cpp:1053 new kernel: registered at /dev/null:241 (Triggered internally at /opt/conda/conda-bld/pytorch_1670525541990/work/aten/src/ATen/core/dispatch/OperatorEntry.cpp:150.) self.m.impl(name, dispatch_key, fn) Traceback (most recent call last): File "/home//anaconda3/envs/s20230216e310gpt/lib/python3.10/site-packages/huggingface_hub/utils/_errors.py", line 264, in hf_raise_for_status response.raise_for_status() File "/home//anaconda3/envs/s20230216e310gpt/lib/python3.10/site-packages/requests/models.py", line 1021, in raise_for_status raise HTTPError(http_error_msg, response=self) requests.exceptions.HTTPError: 401 Client Error: Unauthorized for url: https://huggingface.co/pretrain/resolve/main/tokenizer_config.json

The above exception was the direct cause of the following exception:

Traceback (most recent call last): File "/home//anaconda3/envs/s20230216e310gpt/lib/python3.10/site-packages/transformers/utils/hub.py", line 409, in cached_file resolved_file = hf_hub_download( File "/home//anaconda3/envs/s20230216e310gpt/lib/python3.10/site-packages/huggingface_hub/utils/_validators.py", line 124, in _inner_fn return fn(*args, **kwargs) File "/home//anaconda3/envs/s20230216e310gpt/lib/python3.10/site-packages/huggingface_hub/file_download.py", line 1105, in hf_hub_download metadata = get_hf_file_metadata( File "/home//anaconda3/envs/s20230216e310gpt/lib/python3.10/site-packages/huggingface_hub/utils/_validators.py", line 124, in _inner_fn return fn(*args, **kwargs) File "/home//anaconda3/envs/s20230216e310gpt/lib/python3.10/site-packages/huggingface_hub/file_download.py", line 1440, in get_hf_file_metadata hf_raise_for_status(r) File "/home//anaconda3/envs/s20230216e310gpt/lib/python3.10/site-packages/huggingface_hub/utils/_errors.py", line 306, in hf_raise_for_status raise RepositoryNotFoundError(message, response) from e huggingface_hub.utils._errors.RepositoryNotFoundError: 401 Client Error. (Request ID: Root=1-63eef597-1ad9281247cec6eb3b08b2ce)

Repository Not Found for url: https://huggingface.co/pretrain/resolve/main/tokenizer_config.json. Please make sure you specified the correct repo_id and repo_type. If you are trying to access a private or gated repo, make sure you are authenticated. Invalid username or password.

During handling of the above exception, another exception occurred:

Traceback (most recent call last): File "/home//Workspace/sxk/2023/0216_chatgpt/examples/train_reward_model.py", line 53, in train(args) File "/home//Workspace/sxk/2023/0216_chatgpt/examples/train_reward_model.py", line 13, in train tokenizer = BloomTokenizerFast.from_pretrained(args.pretrain) File "/home//anaconda3/envs/s20230216e310gpt/lib/python3.10/site-packages/transformers/tokenization_utils_base.py", line 1727, in from_pretrained resolved_config_file = cached_file( File "/home//anaconda3/envs/s20230216e310gpt/lib/python3.10/site-packages/transformers/utils/hub.py", line 424, in cached_file raise EnvironmentError( OSError: pretrain is not a local folder and is not a valid model identifier listed on 'https://huggingface.co/models' If this is a private repository, make sure to pass a token having permission to this repo with use_auth_token or log in with huggingface-cli login and pass use_auth_token=True.

Environment

Linux version 3.10.0-693.el7.x86_64 ([email protected]) (gcc version 4.8.5 20150623 (Red Hat 4.8.5-16) (GCC) ) #1 SMP Tue Aug 22 21:09:27 UTC 2017

python=3.10.9

conda 4.14.0

nvcc: NVIDIA (R) Cuda compiler driver Copyright (c) 2005-2022 NVIDIA Corporation Built on Wed_Jun__8_16:49:14_PDT_2022 Cuda compilation tools, release 11.7, V11.7.99 Build cuda_11.7.r11.7/compiler.31442593_0

Feb 17 '23 03:02 sxk000

Please replace the 'pretrain' in your passed argument with the options provided here (e.g. 'bigscience/bloom-560m')

Feb 17 '23 13:02 JThh

Please replace the 'pretrain' in your passed argument with the options provided here (e.g. 'bigscience/bloom-560m')

i try with ：python train_reward_model.py --pretrain bigscience/bloom-560m it works ！ thanks very much ！

Feb 20 '23 03:02 sxk000

[BUG]: Repository Not Found for url: https://huggingface.co/pretrain/resolve/main/tokenizer_config.json

🐛 Describe the bug

Environment