Matthieu Beaumont comments

Repositories
Issues
Comments

Results 64 comments of


                                            Matthieu Beaumont

feat: Introduce K/V Context Quantisation (vRAM improvements)

> > which OS are you running it on? i am trying to build on windows, although i am new to building windows binaries and haven't exactly done this before...

feat: Introduce K/V Context Quantisation (vRAM improvements)

when i try with deepseek-coder-v2:16b-lite-instruct-q8_0 : llama_new_context_with_model: flash_attn requires n_embd_head_k == n_embd_head_v - forcing off llama_new_context_with_model: V cache quantization requires flash_attn llama_init_from_gpt_params: error: failed to create context with model '/mnt/AI_TEXT/OLLAMA/blobs/sha256-x'...

Error creating pre-trained FluxPipeline with diffusers "Cannot instantiate this tokenizer from a slow version"

try pip install sentencepiece ?

Ollama - Does not support tools (status code: 400) Issue

qwen:14b is a very old model, try to use qwen2.5:14b