JohnConnor123
JohnConnor123
Про то, что вы не являетесь сотрудником Яндекс понятно. Вопрос в том, можно ли подменить перевод Яндекса переводом Гугла
> The issue is that arguments are not properly [url-encoded](https://en.wikipedia.org/wiki/Percent-encoding) when sent via HTTP, so '%' followed by digits will be interpreted as a random character. > > One place...
> Can you try to add `--tokenizer=microsoft/Phi-4-mini-instruct` when serving the model? I suspect the Phi-4 tokenizer conversion is broken at transformers side. this doesn't help: ``` INFO 04-12 05:29:04 [__init__.py:239]...
> ``` > vllm serve /tmp/phi-4-q4.gguf --max-model-len 4096 --dtype half --tokenizer microsoft/phi-4 --enable-chunked-prefill --enable-prefix-caching > ``` I don't have an environment with vllm, I always use vllm from docker. By...
> I also tried [Q6_K](https://huggingface.co/unsloth/phi-4-GGUF/blob/main/phi-4-Q6_K.gguf) but still can't reproduce the CUDA index error, can you provide the information that which Q6 checkpoint are you using? > > ``` > vllm...
> I also tried [Q6_K](https://huggingface.co/unsloth/phi-4-GGUF/blob/main/phi-4-Q6_K.gguf) but still can't reproduce the CUDA index error I tried to run the model from your link. With the argument `--tokenizer=microsoft/Phi-4-mini-instruct` I got the error...
> but still can't reproduce the CUDA index error Sorry, I forgot to mention that my error occurs not when starting llm, but when sending the first message to llm
Changing `--tokenizer=microsoft/Phi-4-mini-instruct` to `--tokenizer=microsoft/Phi-4` fixed `RuntimeError: CUDA error: device-side assert triggered` error. Now all works correct
> Huggingface models and ./llama-perplexity calculations may be different, with llama.cpp numbers internally showing higher numbers than huggingface's perplexity measurements. > > What people have done previously: > > 1....
> Hi, you can build all the example binaries by following here: https://github.com/ggerganov/llama.cpp/blob/master/docs/build.md#cpu-build > > You may also consider downloading pre-compiled binaries, from the Release page. https://github.com/ggerganov/llama.cpp/releases/download/b4576/llama-b4576-bin-ubuntu-x64.zip this contains the...