Leon Knauer
Leon Knauer
Hi there! I built Tensorflow 2.1 for macOS catalina with support for AVX, AVX2, FMA, SSE4.1 and SSE4.2. You can find the wheel file here: https://github.com/reuank/tensorflow-wheels-macOS/releases/tag/tensorflow-2.1-catalina
Hey @Rybens92, I haven't testet this exact configuration myself yet, but you can try to specify a tokenizer directly by using the `tokenizer="ehartford/dolphin-2.2-yi-34b"` option. Playground example: ``` argmax "What is...
You need to add the `trust_remote_code=True` option, as the YiTokenizer is not known to the `tokenizer` library by hf. This is also documented here: https://huggingface.co/ehartford/dolphin-2_2-yi-34b. With this, the downloaded tokenizer...
On my machine, the following example runs in the LMQL playground and produces sensible output: ``` argmax "What is the capital of France? [RESPONSE]" from lmql.model("local:llama.cpp:/YOUR_PATH/dolphin-2_2-yi-34b.Q4_0.gguf", tokenizer="ehartford/dolphin-2_2-yi-34b", trust_remote_code=True) where len(TOKENS(RESPONSE))...
Okay, I cannot reproduce that, and know too little about the rest of your setup and the other changes you have made. Glad that you found something that works for...
Just a quick update on this topic. It looks like llama-cpp-python will add that feature very soon: https://github.com/abetlen/llama-cpp-python/pull/951.
Hey @lbeurerkellner, are you aware of anyone currently working on this? Otherwise, I will have a look at the approach @ggbetz described (adding a new vLLM backend, similar to [llama_cpp_model](https://github.com/eth-sri/lmql/blob/main/src/lmql/models/lmtp/backends/llama_cpp_model.py)).
Hi @KamilLegault, you can have a look here: https://lmql.ai/docs/models/llama.cpp.html#model-server. You can start a LMTP inference endpoint by running ```bash lmql serve-model llama.cpp:/YOUR_PATH/YOUR_MODEL.gguf ```` In the playground, you then need to...
Hey @parallaxe, I am also very interested in this feature. Have you managed to get the attention scores yet?