Alexey Parfenov
Alexey Parfenov
I have RTX 3060 and get the same error.
> It seems that your CUDA driver is not detected Yes, after I installed the CUDA Toolkit the error went away (in my case). Thank you!
Not exactly a solution, but there's a dev-version of KoboldAI that allows to split the ML workload between GPU and CPU: https://github.com/henk717/KoboldAI This version works with `hfj`-models that are found...
UPDATE (for future readers): the title was changed. --- I think that the title of this issue is a little bit misleading. Technically, a custom `device_map` is already supported for...
> If you think this still needs to be addressed please comment on this thread. unstale
Can confirm for the `server` too. ```sh curl -Ss --data '{"n_predict":32, "prompt":"Bob: Hi, Alice!\n", "grammar":"root ::= (\"Bob\" | \"Alice\") \":\""}' http://127.0.0.1:8080/completion ``` ``` {"tid":"140147849643840","timestamp":1713292234,"level":"INFO","function":"launch_slot_with_task","line":1037,"msg":"slot is processing task","id_slot":0,"id_task":0} {"tid":"140147849643840","timestamp":1713292234,"level":"INFO","function":"update_slots","line":2066,"msg":"kv cache rm...
The debugger fooled me. It's not actually an empty string, it's a sequence of 3 bytes: `e2808d`. And it seems like the map created by `unicode_utf8_to_byte_map()` does not contain this...
If you ignore the error with `try...catch` the crash is gone and the output seems valid. Though it does not seem like a proper solution. ```diff diff --git a/llama.cpp b/llama.cpp...
> If you think this still needs to be addressed please comment on this thread. unstale I guess this will be my monthly routine...
I've just tested that PR and it works. Thank you! I tested it with a 13B model on GTX 3060. Without `load_in_8bit` only 10 layers are able to fit into...