Gemma model - help needed
Can any colleague help with the example of interference with the Gemma model in CTranslate2? Unfortunately, there is no information about this model in the documentation.
Thx
Hello, I will update the doc in the future. BTW, you can convert the Gemma like mention in the llama documentation.
ct2-transformers-converter --model google/gemma-7b --output_dir gemma_ct2
Then you can try with script:
import ctranslate2
import transformers
generator = ctranslate2.Generator("gemma_ct2")
tokenizer = transformers.AutoTokenizer.from_pretrained("google/gemma-7b")
b_inst = '<start_of_turn>'
e_inst = '<end_of_turn>'
intput = 'Ask something'
prompt = b_inst + 'user' + input + e_inst + '\n' + b_inst + model
tokens = tokenizer.convert_ids_to_tokens(tokenizer.encode(prompt))
results = generator.generate_batch([tokens], max_length=30, sampling_topk=10)
print(tokenizer.decode(results[0].sequences_ids[0]))
@minhthuc2502 - Does CTranslate2 support openchat models e.g. openchat/openchat-3.5-0106-gemma? I managed to perform the conversion to ct2, but I can't "make" it work properly? THX
What is the error? I see the defined architecture in openchat model is GemmaForCausalLM so I think it should work.
@minhthuc2502 I use below code for test:
import ctranslate2
import transformers
generator = ctranslate2.Generator("/test/openchat35gemma", device="cuda", device_index=1)
tokenizer = transformers.AutoTokenizer.from_pretrained("/test/openchat35gemma")
prompt = f"GPT4 Correct User: Hello<end_of_turn>GPT4 Correct Assistant: Hi<end_of_turn>GPT4 Correct User: How are you today?<end_of_turn>GPT4 Correct Assistant: "
tokens = tokenizer.convert_ids_to_tokens(tokenizer.encode(prompt))
results = generator.generate_batch([tokens], max_length=4096, sampling_temperature=0.1, sampling_topk=1, sampling_topp=0.1, include_prompt_in_result=False)
print(tokenizer.decode(results[0].sequences_ids[0]))
The result is random characters. Where did I go wrong? tokenizer.model is on path. THX.