John Ward

Results 5 comments of John Ward

I'd really love to see this implemented. Without this feature, I've just muted the app. Edit: I did find that you can get a similar experience in the [github scheduled...

I'm using llama.cpp's server. I wasn't sure if I could use all of the params that I am using with lmql's server. I couldn't find any doc on it. I...

This is probably outside of the scope, but I do see some activity with this code: ``` import lmql import os os.environ['OPENAI_API_KEY'] = 'fakekey' @lmql.query async def test(): '''lmql argmax...

I think I was able to get the model to serve using, though it doesn't log output: ``` lmql serve-model llama.cpp:/Users/jward/Projects/llama.cpp/models/llama-2-13b-chat.Q8_0.gguf --use_mlock True --n_gpu_layers 1 ``` It looks like the...

I appreciate your patience with me as I jumped a few topics. This command ended up working for me: ``` lmql serve-model llama.cpp:/Users/jward/Projects/llama.cpp/models/llama-2-70b-orca-200k.Q5_K_M.gguf --use_mlock True --n_gpu_layers 1 --n_gqa 8 --n_ctx...