llama.cpp Eval bug: TikTokenTokenizer has no attribute vocab

Name and Version

./llama-cli --version ggml_cuda_init: GGML_CUDA_FORCE_MMQ: no ggml_cuda_init: GGML_CUDA_FORCE_CUBLAS: no ggml_cuda_init: found 1 CUDA devices: Device 0: NVIDIA GeForce RTX 3080 Laptop GPU, compute capability 8.6, VMM: yes version: 1 (3edfa7d) built with cc (Ubuntu 10.5.0-1ubuntu1~22.04) 10.5.0 for x86_64-linux-gnu

Operating systems

Linux

GGML backends

CUDA

Hardware

RTX 3080 Laptop

Models

Moonlight-16B-A3B-Instruct

Problem description & steps to reproduce

when i run python convert_hf_to_gguf.py ./Moonlight-16B-A3B-Instruct --outfile Moonlight-16B-A3B-Instruct.gguf --outtype f16 it shows:

First Bad Commit

INFO:hf-to-gguf:blk.8.attn_norm.weight, torch.bfloat16 --> F32, shape = {2048} INFO:hf-to-gguf:blk.8.ffn_down_exps.weight, torch.bfloat16 --> F16, shape = {1408, 2048, 64} INFO:hf-to-gguf:blk.8.ffn_gate_exps.weight, torch.bfloat16 --> F16, shape = {2048, 1408, 64} INFO:hf-to-gguf:blk.8.ffn_up_exps.weight, torch.bfloat16 --> F16, shape = {2048, 1408, 64} INFO:hf-to-gguf:blk.8.exp_probs_b.bias, torch.bfloat16 --> F32, shape = {64} INFO:hf-to-gguf:blk.8.ffn_gate_inp.weight, torch.bfloat16 --> F32, shape = {2048, 64} INFO:hf-to-gguf:blk.8.ffn_down_shexp.weight, torch.bfloat16 --> F16, shape = {2816, 2048} INFO:hf-to-gguf:blk.8.ffn_gate_shexp.weight, torch.bfloat16 --> F16, shape = {2048, 2816} INFO:hf-to-gguf:blk.8.ffn_up_shexp.weight, torch.bfloat16 --> F16, shape = {2048, 2816} INFO:hf-to-gguf:blk.8.ffn_norm.weight, torch.bfloat16 --> F32, shape = {2048} INFO:hf-to-gguf:blk.8.attn_kv_a_norm.weight, torch.bfloat16 --> F32, shape = {512} INFO:hf-to-gguf:blk.8.attn_kv_a_mqa.weight, torch.bfloat16 --> F16, shape = {2048, 576} INFO:hf-to-gguf:blk.8.attn_kv_b.weight, torch.bfloat16 --> F16, shape = {512, 4096} INFO:hf-to-gguf:blk.8.attn_output.weight, torch.bfloat16 --> F16, shape = {2048, 2048} INFO:hf-to-gguf:blk.8.attn_q.weight, torch.bfloat16 --> F16, shape = {2048, 3072} INFO:hf-to-gguf:Set meta model INFO:hf-to-gguf:Set model parameters INFO:hf-to-gguf:gguf: context length = 8192 INFO:hf-to-gguf:gguf: embedding length = 2048 INFO:hf-to-gguf:gguf: feed forward length = 11264 INFO:hf-to-gguf:gguf: head count = 16 INFO:hf-to-gguf:gguf: key-value head count = 16 INFO:hf-to-gguf:gguf: rope theta = 50000.0 INFO:hf-to-gguf:gguf: rms norm epsilon = 1e-05 INFO:hf-to-gguf:gguf: experts used count = 6 INFO:hf-to-gguf:gguf: file type = 1 INFO:hf-to-gguf:Set model tokenizer INFO:transformers_modules.Moonlight-16B-A3B-Instruct.tokenization_moonshot:Reloaded tiktoken model from Moonlight-16B-A3B-Instruct/tiktoken.model INFO:transformers_modules.Moonlight-16B-A3B-Instruct.tokenization_moonshot:#words: 163842 - BOS ID: 163584 - EOS ID: 163585 Traceback (most recent call last): File "/home/zhanghui/llama.cpp/convert_hf_to_gguf.py", line 5139, in main() File "/home/zhanghui/llama.cpp/convert_hf_to_gguf.py", line 5133, in main model_instance.write() File "/home/zhanghui/llama.cpp/convert_hf_to_gguf.py", line 440, in write self.prepare_metadata(vocab_only=False) File "/home/zhanghui/llama.cpp/convert_hf_to_gguf.py", line 433, in prepare_metadata self.set_vocab() File "/home/zhanghui/llama.cpp/convert_hf_to_gguf.py", line 4057, in set_vocab self._set_vocab_gpt2() File "/home/zhanghui/llama.cpp/convert_hf_to_gguf.py", line 726, in _set_vocab_gpt2 tokens, toktypes, tokpre = self.get_vocab_base() File "/home/zhanghui/llama.cpp/convert_hf_to_gguf.py", line 524, in get_vocab_base vocab_size = self.hparams.get("vocab_size", len(tokenizer.vocab)) File "/home/zhanghui/anaconda3/envs/kimi/lib/python3.10/site-packages/transformers/tokenization_utils_base.py", line 1108, in getattr raise AttributeError(f"{self.class.name} has no attribute {key}") AttributeError: TikTokenTokenizer has no attribute vocab

Relevant log output

INFO:hf-to-gguf:blk.8.attn_norm.weight,       torch.bfloat16 --> F32, shape = {2048}
INFO:hf-to-gguf:blk.8.ffn_down_exps.weight,   torch.bfloat16 --> F16, shape = {1408, 2048, 64}
INFO:hf-to-gguf:blk.8.ffn_gate_exps.weight,   torch.bfloat16 --> F16, shape = {2048, 1408, 64}
INFO:hf-to-gguf:blk.8.ffn_up_exps.weight,     torch.bfloat16 --> F16, shape = {2048, 1408, 64}
INFO:hf-to-gguf:blk.8.exp_probs_b.bias,       torch.bfloat16 --> F32, shape = {64}
INFO:hf-to-gguf:blk.8.ffn_gate_inp.weight,    torch.bfloat16 --> F32, shape = {2048, 64}
INFO:hf-to-gguf:blk.8.ffn_down_shexp.weight,  torch.bfloat16 --> F16, shape = {2816, 2048}
INFO:hf-to-gguf:blk.8.ffn_gate_shexp.weight,  torch.bfloat16 --> F16, shape = {2048, 2816}
INFO:hf-to-gguf:blk.8.ffn_up_shexp.weight,    torch.bfloat16 --> F16, shape = {2048, 2816}
INFO:hf-to-gguf:blk.8.ffn_norm.weight,        torch.bfloat16 --> F32, shape = {2048}
INFO:hf-to-gguf:blk.8.attn_kv_a_norm.weight,  torch.bfloat16 --> F32, shape = {512}
INFO:hf-to-gguf:blk.8.attn_kv_a_mqa.weight,   torch.bfloat16 --> F16, shape = {2048, 576}
INFO:hf-to-gguf:blk.8.attn_kv_b.weight,       torch.bfloat16 --> F16, shape = {512, 4096}
INFO:hf-to-gguf:blk.8.attn_output.weight,     torch.bfloat16 --> F16, shape = {2048, 2048}
INFO:hf-to-gguf:blk.8.attn_q.weight,          torch.bfloat16 --> F16, shape = {2048, 3072}
INFO:hf-to-gguf:Set meta model
INFO:hf-to-gguf:Set model parameters
INFO:hf-to-gguf:gguf: context length = 8192
INFO:hf-to-gguf:gguf: embedding length = 2048
INFO:hf-to-gguf:gguf: feed forward length = 11264
INFO:hf-to-gguf:gguf: head count = 16
INFO:hf-to-gguf:gguf: key-value head count = 16
INFO:hf-to-gguf:gguf: rope theta = 50000.0
INFO:hf-to-gguf:gguf: rms norm epsilon = 1e-05
INFO:hf-to-gguf:gguf: experts used count = 6
INFO:hf-to-gguf:gguf: file type = 1
INFO:hf-to-gguf:Set model tokenizer
INFO:transformers_modules.Moonlight-16B-A3B-Instruct.tokenization_moonshot:Reloaded tiktoken model from Moonlight-16B-A3B-Instruct/tiktoken.model
INFO:transformers_modules.Moonlight-16B-A3B-Instruct.tokenization_moonshot:#words: 163842 - BOS ID: 163584 - EOS ID: 163585
Traceback (most recent call last):
  File "/home/zhanghui/llama.cpp/convert_hf_to_gguf.py", line 5139, in <module>
    main()
  File "/home/zhanghui/llama.cpp/convert_hf_to_gguf.py", line 5133, in main
    model_instance.write()
  File "/home/zhanghui/llama.cpp/convert_hf_to_gguf.py", line 440, in write
    self.prepare_metadata(vocab_only=False)
  File "/home/zhanghui/llama.cpp/convert_hf_to_gguf.py", line 433, in prepare_metadata
    self.set_vocab()
  File "/home/zhanghui/llama.cpp/convert_hf_to_gguf.py", line 4057, in set_vocab
    self._set_vocab_gpt2()
  File "/home/zhanghui/llama.cpp/convert_hf_to_gguf.py", line 726, in _set_vocab_gpt2
    tokens, toktypes, tokpre = self.get_vocab_base()
  File "/home/zhanghui/llama.cpp/convert_hf_to_gguf.py", line 524, in get_vocab_base
    vocab_size = self.hparams.get("vocab_size", len(tokenizer.vocab))
  File "/home/zhanghui/anaconda3/envs/kimi/lib/python3.10/site-packages/transformers/tokenization_utils_base.py", line 1108, in __getattr__
    raise AttributeError(f"{self.__class__.__name__} has no attribute {key}")
AttributeError: TikTokenTokenizer has no attribute vocab

Feb 24 '25 03:02 zhanghui-china

You should use tokenizer.vocab_size instead of len(tokenizer.vocab)

Feb 24 '25 16:02 grapevine-AI

You should use tokenizer.vocab_size instead of len(tokenizer.vocab)

thanks,

tokenizer.vocab.values()) change to what？

Feb 25 '25 14:02 zhanghui-china

Would you change to tokenizer.get_vocab().values().

Or, delete line if this error is assert max(tokenizer.vocab.values()) < vocab_size. Assert is not necessary.

Feb 25 '25 22:02 grapevine-AI

Would you change to tokenizer.get_vocab().values().

Or, delete line if this error is assert max(tokenizer.vocab.values()) < vocab_size. Assert is not necessary.

thanks a lot. but when i delete assert , another error happened: `INFO:hf-to-gguf:Set model tokenizer INFO:transformers_modules.95583251e616c46a80715897a705cd38659afc27.tokenization_moonshot:Reloaded tiktoken model from 95583251e616c46a80715897a705cd38659afc27/tiktoken.model INFO:transformers_modules.95583251e616c46a80715897a705cd38659afc27.tokenization_moonshot:#words: 163842 - BOS ID: 163584 - EOS ID: 163585 WARNING:hf-to-gguf:

WARNING:hf-to-gguf:************************************************************************************** WARNING:hf-to-gguf:** WARNING: The BPE pre-tokenizer was not recognized! WARNING:hf-to-gguf:** There are 2 possible reasons for this: WARNING:hf-to-gguf:** - the model has not been added to convert_hf_to_gguf_update.py yet WARNING:hf-to-gguf:** - the pre-tokenization config has changed upstream WARNING:hf-to-gguf:** Check your model files and convert_hf_to_gguf_update.py and update them accordingly. WARNING:hf-to-gguf:** ref: https://github.com/ggerganov/llama.cpp/pull/6920 WARNING:hf-to-gguf:** WARNING:hf-to-gguf:** chkhsh: 81212dc7cdb7e0c1074ca62c5aeab0d43c9f52b8a737be7b12a777c953027890 WARNING:hf-to-gguf:************************************************************************************** WARNING:hf-to-gguf:

Traceback (most recent call last): File "/home/zhanghui/llama.cpp/convert_hf_to_gguf.py", line 5141, in main() File "/home/zhanghui/llama.cpp/convert_hf_to_gguf.py", line 5135, in main model_instance.write() File "/home/zhanghui/llama.cpp/convert_hf_to_gguf.py", line 440, in write self.prepare_metadata(vocab_only=False) File "/home/zhanghui/llama.cpp/convert_hf_to_gguf.py", line 433, in prepare_metadata self.set_vocab() File "/home/zhanghui/llama.cpp/convert_hf_to_gguf.py", line 4059, in set_vocab self._set_vocab_gpt2() File "/home/zhanghui/llama.cpp/convert_hf_to_gguf.py", line 728, in _set_vocab_gpt2 tokens, toktypes, tokpre = self.get_vocab_base() File "/home/zhanghui/llama.cpp/convert_hf_to_gguf.py", line 529, in get_vocab_base tokpre = self.get_vocab_base_pre(tokenizer) File "/home/zhanghui/llama.cpp/convert_hf_to_gguf.py", line 716, in get_vocab_base_pre raise NotImplementedError("BPE pre-tokenizer was not recognized - update get_vocab_base_pre()") NotImplementedError: BPE pre-tokenizer was not recognized - update get_vocab_base_pre() (llama.cpp) zhanghui@zhanghui:~/.cache/huggingface/hub/models--moonshotai--Moonlight-16B-A3B-Instruct/snapshots$`

maybe i should wait for llama.cpp support this model.

model address: https://huggingface.co/moonshotai/Moonlight-16B-A3B-Instruct

Feb 26 '25 00:02 zhanghui-china

Yes, llama.cpp have not implement Moonlight's pre-tokenizer yet. But, you can substitute other model's pre-tokenizer. If you want it, you should add this code in line 702.

if chkhsh == "81212dc7cdb7e0c1074ca62c5aeab0d43c9f52b8a737be7b12a777c953027890":
    res = "llama-bpe"

Feb 26 '25 02:02 grapevine-AI

This issue was closed because it has been inactive for 14 days since being marked as stale.

Apr 12 '25 01:04 github-actions[bot]