transformers 4.34 caused NotImplementedError when calling CTransformersTokenizer(PreTrainedTokenizer)
transformers version: pip install transformers==4.34.0
ctransformersversion: pip install ctransformers==0.2.27
I encounter the following error
File ".venv\lib\site-packages\ctransformers\transformers.py", line 84, in __init__kages\ctransformers\transformers.py", line 84, in __init__
super().__init__(**kwargs)
File ".venv\lib\site-packages\transformers\tokenization_utils.py", line 366, in __init__
self._add_tokens(self.all_special_tokens_extended, special_tokens=True)
File ".venv\lib\site-packages\transformers\tokenization_utils.py", line 462, in _add_tokens
current_vocab = self.get_vocab().copy()
File ".venv\lib\site-packages\transformers\tokenization_utils_base.py", line 1715, in ``get_vocab
raise NotImplementedError()``
NotImplementedError
transformers has PreTrainedTokenizer in tokenization_utils.py code change (2da8853) where _add_tokens on line 454 current_vocab = self.get_vocab().copy().
PreTrainedTokenizer itself has added_tokens_decoder and __len__ implemented, so only get_vocab would cause NotImplementedError()
Issue can be created with this code from the readme
from ctransformers import AutoModelForCausalLM
llm = AutoModelForCausalLM.from_pretrained("/path/to/ggml-model.bin", model_type="gpt2", hf=True)
print(llm("AI is going to"))
or in https://colab.research.google.com/drive/1GMhYMUAv_TyZkpfvUI1NirM8-9mCXQyL.
Hope this issue is address because finding the correct tokenizer from a different source may not be possible for most model.
PR submitted and it works for me now, this is my setup.
model = AutoModelForCausalLM.from_pretrained(..., hf=True)
tokenizer = AutoTokenizer.from_pretrained(model)
transformers 4.34.0 now support Mistral, so I really want to use it. 😁
I spent all day trying to get Mistral working with ctranformers, but it is returning garbage text on my end. I believe it may be the tokenizer because tokenizer = AutoTokenizer.from_pretrained(model) will not work for any model.
Yes, they refactored PreTrainedTokenizer which cTransformers tokernizer extended from. I ran open orca Mistral it runs fine with 4.34 but all quantized failed unless I go back to 4.33, so my PR fixed that. I will try to run quantized Mistral tomorrow to see if it works.
I just ran TheBloke/Mistral-7B-OpenOrca-GGUF, it works fine for me.
Are you able to use the model.generate(...) I have got everything to run until I start generating text, it will just run indefinitely.
ok, I quickly write this up and it works fine (you will need transformers==4.34.0 then build ctransformers from #155 and install)
import os
from ctransformers import (
AutoModelForCausalLM as cAutoModelForCausalLM,
AutoTokenizer as cAutoTokenizer,
)
model = cAutoModelForCausalLM.from_pretrained(
model_path_or_repo_id="TheBloke/Mistral-7B-OpenOrca-GGUF",
model_file="mistral-7b-openorca.Q5_K_M.gguf",
model_type="mistral",
hf=True,
temperature=0.7,
top_p=0.7,
top_k=50,
repetition_penalty=1.2,
context_length=8096,
max_new_tokens=2048,
threads=os.cpu_count(),
stream=True,
gpu_layers=0
)
tokenizer = cAutoTokenizer.from_pretrained(model)
mistral_no_mem_prompt_template = """
<|im_start|>system
The following is a friendly conversation between a human and an AI. The AI is talkative and provides lots of specific details from its context. If the AI does not know the answer to a question, it truthfully says it does not know.
<|im_end|>
{placeholder}
"""
mistral_openorca_prompt = """
<|im_start|>user
{input}<|im_end|>
<|im_start|>assistant
"""
mistral_no_mem_template = mistral_no_mem_prompt_template.replace("{placeholder}", mistral_openorca_prompt)
question = "The cafeteria had 23 apples. If they used 20 for lunch and bought 6 more, how many apples do they have?"
prompt = mistral_no_mem_template.replace("{input}", question)
input_ids = tokenizer(prompt, return_tensors="pt").input_ids.to("cpu")
generated_ids = model.generate(input_ids, max_new_tokens=2048, temperature=0.7, do_sample=True)
response = tokenizer.decode(generated_ids[0][input_ids.shape[-1]:], skip_special_tokens=True, clean_up_tokenization_space=True)
print(f"Response: {response}")
Still having issues with tokenizer = cAutoTokenizer.from_pretrained(model) but using Open-Orca/Mistral-7B-OpenOrca for the tokenizer appears to resolved it. I am not too happy about the speed though. When using lllm =cAutoModelForCausalLM.from_pretrained(...) and then llm('Tell me a story about a knight') it will generate a full story in 10-24 seconds (200-800 tokens). But when using the generate function it takes about 15 minutes to generate 200 tokens. I am using a 3070ti, for reference
So i get x15 faster token output by having no gpu layers.... I think something is wrong
yes, something is wrong, for me, gpu_layer has no effect.😅
I found if I build it myself, gpu_layer does not work, no idea why.
I think my lib was a bit messy yesterday. I copy get_vocab from transformers and pushed to the PR #155 . I test with open ocra mistral code from above (type mistral) and exact same code but switch the model to vicuna 1.5 gguf (type llama) and also works. @CHesketh76 can you rebuild and give it a try?
model = cAutoModelForCausalLM.from_pretrained(
model_path_or_repo_id="TheBloke/vicuna-13B-v1.5-16K-GGUF",
model_file="vicuna-13b-v1.5-16k.Q6_K.gguf",
model_type="llama",
hf=True,
temperature=0.7,
top_p=0.7,
top_k=50,
repetition_penalty=1.2,
context_length=8096,
max_new_tokens=2048,
threads=os.cpu_count(),
stream=True,
gpu_layers=0
)
@victorlee0505 how to rebuilt #155 ?
@victorlee0505 how to rebuilt #155 ?
pip uninstall ctransformers
straight from my fork
pip install --no-cache-dir git+https://github.com/victorlee0505/ctransformers.git@vlee/transformers#egg=ctransformers[cuda]
(even tho i put [cuda], it does not work 😕)
local
git clone https://github.com/victorlee0505/ctransformers.git
cd ctransformers
git checkout vlee/transformers
# I use venv
python -m venv .venv
pip install scikit-build
pip install cmake
python setup.py sdist
under the folder dist, you will have your new package, get the full path and install
pip install --no-cache-dir full\path\to\ctransformers\dist\ctransformers-0.2.27.tar.gz[cuda]
make sure to run export CT_CUBLAS=ON before python setup.py sdist otherwise it won't build the cuda support.
you might also need to setup these two in your bashrc and confirm the nvcc version matches nvidia-smi
export PATH="/usr/local/cuda/bin:$PATH"
export LD_LIBRARY_PATH="/usr/local/cuda/lib64"
Hi @victorlee0505 . I've rebuilt with PR https://github.com/marella/ctransformers/pull/155 and can confirm, the NotImplementedError is gone. Thanks!
I won't move forward with this PR, I don't think it is a good fix, but ok to use as is.
I only copy one of the def get_vocab(self): implementation from transformers related to transformers.models.llama.tokenization_llama.LlamaTokenizer.get_vocab. There are different get_vocab implementation for different type. search def get_vocab(self): in transformers you will see what i mean.
Therefore I can not guarantee nor have time to figure out the perfect solution😥
ok, no I do not get the error on this:
tokenizer = AutoTokenizer.from_pretrained(model)
but now on:
model_inputs = tokenizer([text], return_tensors="pt")
whit this error:
File "/home/pechaut/miniconda3/envs/cairo-llm/lib/python3.9/site-packages/transformers/tokenization_utils.py", line 803, in _batch_encode_plus first_ids = get_input_ids(ids) File "/home/pechaut/miniconda3/envs/cairo-llm/lib/python3.9/site-packages/transformers/tokenization_utils.py", line 770, in get_input_ids tokens = self.tokenize(text, **kwargs) File "/home/pechaut/miniconda3/envs/cairo-llm/lib/python3.9/site-packages/transformers/tokenization_utils.py", line 617, in tokenize tokenized_text.extend(self._tokenize(token)) File "/home/pechaut/miniconda3/envs/cairo-llm/lib/python3.9/site-packages/transformers/tokenization_utils.py", line 628, in _tokenize raise NotImplementedError NotImplementedError