Jean-Gab
Jean-Gab
The bug is worse than expected : only one of the blocks is actually deleted in the editor's internal structure. If we call editor.save() the other deleted blocks are still...
We are in a similar situation, 1 master, 8 workers running 1.21. We are working right now on taking control of that cluster with Kubespray, we created a smaller cluster...
We actually ran into so much trouble that we gave up on it for the time being. I don't remember exactly what the last issue was but it was a...
I'm in, I have hardware sleeping right now. I'll take it up with @nathanielsimard to start coordinating this effort next week, he just poked me about this on Discord.
Thanks for the suggestion of OpenAI, it did work for me although I've had to mess with the parameters a bit. I ended up with : ```python llm = OpenAI(api_key="somestring",...
Same issue here but with **Llama3 8B** on an RTX 4090 with CUDA, and it also completely breaks the server. When one generation goes beyond the context limit, all subsequent...
I just tried with the version suggested above, and it does not work either version: 2960 (https://github.com/ggerganov/llama.cpp/commit/6369bf04336ab60e5c892dd77a3246df91015147) The behavior is a the same but it is a lot slower on...
Indeed my behavior is slightly different but still degradation WITHIN the context length. I posted in this thread instead of opening a new issue since it had enough similarities that...