ctransformers transformers 4.34 caused NotImplementedError when calling CTransformersTokenizer(PreTrainedTokenizer)

transformers version: pip install transformers==4.34.0 ctransformersversion: pip install ctransformers==0.2.27

I encounter the following error

File ".venv\lib\site-packages\ctransformers\transformers.py", line 84, in __init__kages\ctransformers\transformers.py", line 84, in __init__
    super().__init__(**kwargs)

File ".venv\lib\site-packages\transformers\tokenization_utils.py", line 366, in __init__
    self._add_tokens(self.all_special_tokens_extended, special_tokens=True)

File ".venv\lib\site-packages\transformers\tokenization_utils.py", line 462, in _add_tokens
    current_vocab = self.get_vocab().copy()

File ".venv\lib\site-packages\transformers\tokenization_utils_base.py", line 1715, in ``get_vocab
    raise NotImplementedError()``
NotImplementedError

transformers has PreTrainedTokenizer in tokenization_utils.py code change (2da8853) where _add_tokens on line 454 current_vocab = self.get_vocab().copy().

PreTrainedTokenizer itself has added_tokens_decoder and __len__ implemented, so only get_vocab would cause NotImplementedError()

Oct 04 '23 16:10 victorlee0505

Issue can be created with this code from the readme

from ctransformers import AutoModelForCausalLM

llm = AutoModelForCausalLM.from_pretrained("/path/to/ggml-model.bin", model_type="gpt2", hf=True)

print(llm("AI is going to"))

or in https://colab.research.google.com/drive/1GMhYMUAv_TyZkpfvUI1NirM8-9mCXQyL.

Hope this issue is address because finding the correct tokenizer from a different source may not be possible for most model.

Oct 04 '23 20:10 CHesketh76

PR submitted and it works for me now, this is my setup.

model = AutoModelForCausalLM.from_pretrained(..., hf=True)
tokenizer = AutoTokenizer.from_pretrained(model)

transformers 4.34.0 now support Mistral, so I really want to use it. 😁

Oct 04 '23 20:10 victorlee0505

I spent all day trying to get Mistral working with ctranformers, but it is returning garbage text on my end. I believe it may be the tokenizer because tokenizer = AutoTokenizer.from_pretrained(model) will not work for any model.

Capture

Oct 04 '23 22:10 CHesketh76

Yes, they refactored PreTrainedTokenizer which cTransformers tokernizer extended from. I ran open orca Mistral it runs fine with 4.34 but all quantized failed unless I go back to 4.33, so my PR fixed that. I will try to run quantized Mistral tomorrow to see if it works.

Oct 04 '23 22:10 victorlee0505

I just ran TheBloke/Mistral-7B-OpenOrca-GGUF, it works fine for me.

Oct 05 '23 14:10 victorlee0505

Are you able to use the model.generate(...) I have got everything to run until I start generating text, it will just run indefinitely.

Oct 05 '23 14:10 CHesketh76

ok, I quickly write this up and it works fine (you will need transformers==4.34.0 then build ctransformers from #155 and install)


import os
from ctransformers import (
    AutoModelForCausalLM as cAutoModelForCausalLM,
    AutoTokenizer as cAutoTokenizer,
)

model = cAutoModelForCausalLM.from_pretrained(
            model_path_or_repo_id="TheBloke/Mistral-7B-OpenOrca-GGUF", 
            model_file="mistral-7b-openorca.Q5_K_M.gguf", 
            model_type="mistral",
            hf=True,
            temperature=0.7,
            top_p=0.7,
            top_k=50,
            repetition_penalty=1.2,
            context_length=8096,
            max_new_tokens=2048,
            threads=os.cpu_count(),
            stream=True,
            gpu_layers=0
            )
tokenizer = cAutoTokenizer.from_pretrained(model)

mistral_no_mem_prompt_template = """
<|im_start|>system
The following is a friendly conversation between a human and an AI. The AI is talkative and provides lots of specific details from its context. If the AI does not know the answer to a question, it truthfully says it does not know.
<|im_end|>
{placeholder}
"""

mistral_openorca_prompt = """
<|im_start|>user
{input}<|im_end|>
<|im_start|>assistant
"""

mistral_no_mem_template = mistral_no_mem_prompt_template.replace("{placeholder}", mistral_openorca_prompt)
question = "The cafeteria had 23 apples. If they used 20 for lunch and bought 6 more, how many apples do they have?"
prompt = mistral_no_mem_template.replace("{input}", question)

input_ids = tokenizer(prompt, return_tensors="pt").input_ids.to("cpu")
generated_ids = model.generate(input_ids, max_new_tokens=2048, temperature=0.7, do_sample=True)
response = tokenizer.decode(generated_ids[0][input_ids.shape[-1]:], skip_special_tokens=True, clean_up_tokenization_space=True)
print(f"Response: {response}")

Oct 05 '23 16:10 victorlee0505

Still having issues with tokenizer = cAutoTokenizer.from_pretrained(model) but using Open-Orca/Mistral-7B-OpenOrca for the tokenizer appears to resolved it. I am not too happy about the speed though. When using lllm =cAutoModelForCausalLM.from_pretrained(...) and then llm('Tell me a story about a knight') it will generate a full story in 10-24 seconds (200-800 tokens). But when using the generate function it takes about 15 minutes to generate 200 tokens. I am using a 3070ti, for reference

Oct 05 '23 19:10 CHesketh76

So i get x15 faster token output by having no gpu layers.... I think something is wrong

Oct 05 '23 19:10 CHesketh76

yes, something is wrong, for me, gpu_layer has no effect.😅

I found if I build it myself, gpu_layer does not work, no idea why.

Oct 06 '23 14:10 victorlee0505

I think my lib was a bit messy yesterday. I copy get_vocab from transformers and pushed to the PR #155 . I test with open ocra mistral code from above (type mistral) and exact same code but switch the model to vicuna 1.5 gguf (type llama) and also works. @CHesketh76 can you rebuild and give it a try?

model = cAutoModelForCausalLM.from_pretrained(
            model_path_or_repo_id="TheBloke/vicuna-13B-v1.5-16K-GGUF", 
            model_file="vicuna-13b-v1.5-16k.Q6_K.gguf", 
            model_type="llama",
            hf=True,
            temperature=0.7,
            top_p=0.7,
            top_k=50,
            repetition_penalty=1.2,
            context_length=8096,
            max_new_tokens=2048,
            threads=os.cpu_count(),
            stream=True,
            gpu_layers=0
            )

Oct 06 '23 21:10 victorlee0505

@victorlee0505 how to rebuilt #155 ?

Oct 11 '23 06:10 Girrajjangid

@victorlee0505 how to rebuilt #155 ?

pip uninstall ctransformers

straight from my fork

pip install --no-cache-dir git+https://github.com/victorlee0505/ctransformers.git@vlee/transformers#egg=ctransformers[cuda]

(even tho i put [cuda], it does not work 😕)

local

git clone https://github.com/victorlee0505/ctransformers.git
cd ctransformers
git checkout vlee/transformers

# I use venv
python -m venv .venv
pip install scikit-build
pip install cmake
python setup.py sdist

under the folder dist, you will have your new package, get the full path and install

pip install --no-cache-dir full\path\to\ctransformers\dist\ctransformers-0.2.27.tar.gz[cuda]

Oct 11 '23 13:10 victorlee0505

make sure to run export CT_CUBLAS=ON before python setup.py sdist otherwise it won't build the cuda support.

you might also need to setup these two in your bashrc and confirm the nvcc version matches nvidia-smi

export PATH="/usr/local/cuda/bin:$PATH"
export LD_LIBRARY_PATH="/usr/local/cuda/lib64"

Oct 23 '23 03:10 kirill578

Hi @victorlee0505 . I've rebuilt with PR https://github.com/marella/ctransformers/pull/155 and can confirm, the NotImplementedError is gone. Thanks!

Oct 27 '23 09:10 heiko-braun

I won't move forward with this PR, I don't think it is a good fix, but ok to use as is.

I only copy one of the def get_vocab(self): implementation from transformers related to transformers.models.llama.tokenization_llama.LlamaTokenizer.get_vocab. There are different get_vocab implementation for different type. search def get_vocab(self): in transformers you will see what i mean.

Therefore I can not guarantee nor have time to figure out the perfect solution😥

Oct 27 '23 13:10 victorlee0505

ok, no I do not get the error on this:

tokenizer = AutoTokenizer.from_pretrained(model) but now on:

model_inputs = tokenizer([text], return_tensors="pt")

whit this error:

File "/home/pechaut/miniconda3/envs/cairo-llm/lib/python3.9/site-packages/transformers/tokenization_utils.py", line 803, in _batch_encode_plus first_ids = get_input_ids(ids) File "/home/pechaut/miniconda3/envs/cairo-llm/lib/python3.9/site-packages/transformers/tokenization_utils.py", line 770, in get_input_ids tokens = self.tokenize(text, **kwargs) File "/home/pechaut/miniconda3/envs/cairo-llm/lib/python3.9/site-packages/transformers/tokenization_utils.py", line 617, in tokenize tokenized_text.extend(self._tokenize(token)) File "/home/pechaut/miniconda3/envs/cairo-llm/lib/python3.9/site-packages/transformers/tokenization_utils.py", line 628, in _tokenize raise NotImplementedError NotImplementedError

Nov 07 '23 12:11 pechaut78