Eval bug: TikTokenTokenizer has no attribute vocab
Name and Version
./llama-cli --version ggml_cuda_init: GGML_CUDA_FORCE_MMQ: no ggml_cuda_init: GGML_CUDA_FORCE_CUBLAS: no ggml_cuda_init: found 1 CUDA devices: Device 0: NVIDIA GeForce RTX 3080 Laptop GPU, compute capability 8.6, VMM: yes version: 1 (3edfa7d) built with cc (Ubuntu 10.5.0-1ubuntu1~22.04) 10.5.0 for x86_64-linux-gnu
Operating systems
Linux
GGML backends
CUDA
Hardware
RTX 3080 Laptop
Models
Moonlight-16B-A3B-Instruct
Problem description & steps to reproduce
when i run python convert_hf_to_gguf.py ./Moonlight-16B-A3B-Instruct --outfile Moonlight-16B-A3B-Instruct.gguf --outtype f16 it shows:
First Bad Commit
INFO:hf-to-gguf:blk.8.attn_norm.weight, torch.bfloat16 --> F32, shape = {2048}
INFO:hf-to-gguf:blk.8.ffn_down_exps.weight, torch.bfloat16 --> F16, shape = {1408, 2048, 64}
INFO:hf-to-gguf:blk.8.ffn_gate_exps.weight, torch.bfloat16 --> F16, shape = {2048, 1408, 64}
INFO:hf-to-gguf:blk.8.ffn_up_exps.weight, torch.bfloat16 --> F16, shape = {2048, 1408, 64}
INFO:hf-to-gguf:blk.8.exp_probs_b.bias, torch.bfloat16 --> F32, shape = {64}
INFO:hf-to-gguf:blk.8.ffn_gate_inp.weight, torch.bfloat16 --> F32, shape = {2048, 64}
INFO:hf-to-gguf:blk.8.ffn_down_shexp.weight, torch.bfloat16 --> F16, shape = {2816, 2048}
INFO:hf-to-gguf:blk.8.ffn_gate_shexp.weight, torch.bfloat16 --> F16, shape = {2048, 2816}
INFO:hf-to-gguf:blk.8.ffn_up_shexp.weight, torch.bfloat16 --> F16, shape = {2048, 2816}
INFO:hf-to-gguf:blk.8.ffn_norm.weight, torch.bfloat16 --> F32, shape = {2048}
INFO:hf-to-gguf:blk.8.attn_kv_a_norm.weight, torch.bfloat16 --> F32, shape = {512}
INFO:hf-to-gguf:blk.8.attn_kv_a_mqa.weight, torch.bfloat16 --> F16, shape = {2048, 576}
INFO:hf-to-gguf:blk.8.attn_kv_b.weight, torch.bfloat16 --> F16, shape = {512, 4096}
INFO:hf-to-gguf:blk.8.attn_output.weight, torch.bfloat16 --> F16, shape = {2048, 2048}
INFO:hf-to-gguf:blk.8.attn_q.weight, torch.bfloat16 --> F16, shape = {2048, 3072}
INFO:hf-to-gguf:Set meta model
INFO:hf-to-gguf:Set model parameters
INFO:hf-to-gguf:gguf: context length = 8192
INFO:hf-to-gguf:gguf: embedding length = 2048
INFO:hf-to-gguf:gguf: feed forward length = 11264
INFO:hf-to-gguf:gguf: head count = 16
INFO:hf-to-gguf:gguf: key-value head count = 16
INFO:hf-to-gguf:gguf: rope theta = 50000.0
INFO:hf-to-gguf:gguf: rms norm epsilon = 1e-05
INFO:hf-to-gguf:gguf: experts used count = 6
INFO:hf-to-gguf:gguf: file type = 1
INFO:hf-to-gguf:Set model tokenizer
INFO:transformers_modules.Moonlight-16B-A3B-Instruct.tokenization_moonshot:Reloaded tiktoken model from Moonlight-16B-A3B-Instruct/tiktoken.model
INFO:transformers_modules.Moonlight-16B-A3B-Instruct.tokenization_moonshot:#words: 163842 - BOS ID: 163584 - EOS ID: 163585
Traceback (most recent call last):
File "/home/zhanghui/llama.cpp/convert_hf_to_gguf.py", line 5139, in
Relevant log output
INFO:hf-to-gguf:blk.8.attn_norm.weight, torch.bfloat16 --> F32, shape = {2048}
INFO:hf-to-gguf:blk.8.ffn_down_exps.weight, torch.bfloat16 --> F16, shape = {1408, 2048, 64}
INFO:hf-to-gguf:blk.8.ffn_gate_exps.weight, torch.bfloat16 --> F16, shape = {2048, 1408, 64}
INFO:hf-to-gguf:blk.8.ffn_up_exps.weight, torch.bfloat16 --> F16, shape = {2048, 1408, 64}
INFO:hf-to-gguf:blk.8.exp_probs_b.bias, torch.bfloat16 --> F32, shape = {64}
INFO:hf-to-gguf:blk.8.ffn_gate_inp.weight, torch.bfloat16 --> F32, shape = {2048, 64}
INFO:hf-to-gguf:blk.8.ffn_down_shexp.weight, torch.bfloat16 --> F16, shape = {2816, 2048}
INFO:hf-to-gguf:blk.8.ffn_gate_shexp.weight, torch.bfloat16 --> F16, shape = {2048, 2816}
INFO:hf-to-gguf:blk.8.ffn_up_shexp.weight, torch.bfloat16 --> F16, shape = {2048, 2816}
INFO:hf-to-gguf:blk.8.ffn_norm.weight, torch.bfloat16 --> F32, shape = {2048}
INFO:hf-to-gguf:blk.8.attn_kv_a_norm.weight, torch.bfloat16 --> F32, shape = {512}
INFO:hf-to-gguf:blk.8.attn_kv_a_mqa.weight, torch.bfloat16 --> F16, shape = {2048, 576}
INFO:hf-to-gguf:blk.8.attn_kv_b.weight, torch.bfloat16 --> F16, shape = {512, 4096}
INFO:hf-to-gguf:blk.8.attn_output.weight, torch.bfloat16 --> F16, shape = {2048, 2048}
INFO:hf-to-gguf:blk.8.attn_q.weight, torch.bfloat16 --> F16, shape = {2048, 3072}
INFO:hf-to-gguf:Set meta model
INFO:hf-to-gguf:Set model parameters
INFO:hf-to-gguf:gguf: context length = 8192
INFO:hf-to-gguf:gguf: embedding length = 2048
INFO:hf-to-gguf:gguf: feed forward length = 11264
INFO:hf-to-gguf:gguf: head count = 16
INFO:hf-to-gguf:gguf: key-value head count = 16
INFO:hf-to-gguf:gguf: rope theta = 50000.0
INFO:hf-to-gguf:gguf: rms norm epsilon = 1e-05
INFO:hf-to-gguf:gguf: experts used count = 6
INFO:hf-to-gguf:gguf: file type = 1
INFO:hf-to-gguf:Set model tokenizer
INFO:transformers_modules.Moonlight-16B-A3B-Instruct.tokenization_moonshot:Reloaded tiktoken model from Moonlight-16B-A3B-Instruct/tiktoken.model
INFO:transformers_modules.Moonlight-16B-A3B-Instruct.tokenization_moonshot:#words: 163842 - BOS ID: 163584 - EOS ID: 163585
Traceback (most recent call last):
File "/home/zhanghui/llama.cpp/convert_hf_to_gguf.py", line 5139, in <module>
main()
File "/home/zhanghui/llama.cpp/convert_hf_to_gguf.py", line 5133, in main
model_instance.write()
File "/home/zhanghui/llama.cpp/convert_hf_to_gguf.py", line 440, in write
self.prepare_metadata(vocab_only=False)
File "/home/zhanghui/llama.cpp/convert_hf_to_gguf.py", line 433, in prepare_metadata
self.set_vocab()
File "/home/zhanghui/llama.cpp/convert_hf_to_gguf.py", line 4057, in set_vocab
self._set_vocab_gpt2()
File "/home/zhanghui/llama.cpp/convert_hf_to_gguf.py", line 726, in _set_vocab_gpt2
tokens, toktypes, tokpre = self.get_vocab_base()
File "/home/zhanghui/llama.cpp/convert_hf_to_gguf.py", line 524, in get_vocab_base
vocab_size = self.hparams.get("vocab_size", len(tokenizer.vocab))
File "/home/zhanghui/anaconda3/envs/kimi/lib/python3.10/site-packages/transformers/tokenization_utils_base.py", line 1108, in __getattr__
raise AttributeError(f"{self.__class__.__name__} has no attribute {key}")
AttributeError: TikTokenTokenizer has no attribute vocab
You should use tokenizer.vocab_size instead of len(tokenizer.vocab)
You should use
tokenizer.vocab_sizeinstead oflen(tokenizer.vocab)
thanks,
tokenizer.vocab.values()) change to what?
Would you change to tokenizer.get_vocab().values().
Or, delete line if this error is assert max(tokenizer.vocab.values()) < vocab_size.
Assert is not necessary.
Would you change to
tokenizer.get_vocab().values().Or, delete line if this error is
assert max(tokenizer.vocab.values()) < vocab_size. Assert is not necessary.
thanks a lot. but when i delete assert , another error happened: `INFO:hf-to-gguf:Set model tokenizer INFO:transformers_modules.95583251e616c46a80715897a705cd38659afc27.tokenization_moonshot:Reloaded tiktoken model from 95583251e616c46a80715897a705cd38659afc27/tiktoken.model INFO:transformers_modules.95583251e616c46a80715897a705cd38659afc27.tokenization_moonshot:#words: 163842 - BOS ID: 163584 - EOS ID: 163585 WARNING:hf-to-gguf:
WARNING:hf-to-gguf:************************************************************************************** WARNING:hf-to-gguf:** WARNING: The BPE pre-tokenizer was not recognized! WARNING:hf-to-gguf:** There are 2 possible reasons for this: WARNING:hf-to-gguf:** - the model has not been added to convert_hf_to_gguf_update.py yet WARNING:hf-to-gguf:** - the pre-tokenization config has changed upstream WARNING:hf-to-gguf:** Check your model files and convert_hf_to_gguf_update.py and update them accordingly. WARNING:hf-to-gguf:** ref: https://github.com/ggerganov/llama.cpp/pull/6920 WARNING:hf-to-gguf:** WARNING:hf-to-gguf:** chkhsh: 81212dc7cdb7e0c1074ca62c5aeab0d43c9f52b8a737be7b12a777c953027890 WARNING:hf-to-gguf:************************************************************************************** WARNING:hf-to-gguf:
Traceback (most recent call last):
File "/home/zhanghui/llama.cpp/convert_hf_to_gguf.py", line 5141, in
maybe i should wait for llama.cpp support this model.
model address: https://huggingface.co/moonshotai/Moonlight-16B-A3B-Instruct
Yes, llama.cpp have not implement Moonlight's pre-tokenizer yet. But, you can substitute other model's pre-tokenizer. If you want it, you should add this code in line 702.
if chkhsh == "81212dc7cdb7e0c1074ca62c5aeab0d43c9f52b8a737be7b12a777c953027890":
res = "llama-bpe"
This issue was closed because it has been inactive for 14 days since being marked as stale.