saood06
saood06
Added a chunked version of try join all. Drives at most n Futures at a time, and then will return once all are completed, or returns early if any error...
As mentioned in https://github.com/lmg-anon/mikupad/pull/102#issuecomment-2870853199 with a large amount of entries the initial loadtime can take minutes as even though sessions only returns the name the json_extract is very costly. This...
This is my attempt to address https://github.com/ggml-org/llama.cpp/issues/11970 It has some similarities to https://github.com/ggml-org/llama.cpp/pull/12067 but is not a port it is implemented differently. It matches tokens in cache directly to the...
I think this looks cleaner. It does remove mentions to: `IQ1_S_R4` [PR 492](https://github.com/ikawrakow/ik_llama.cpp/pull/492), `IQ1_M_R4` [PR 494](https://github.com/ikawrakow/ik_llama.cpp/pull/494). They didn't belong in that section, but now I don't know where it would...
### What happened? This was reported in #345 and I was also able to reproduce it on an Android device, there is a workaround with #347 but ideally you should...
The motivation for me testing batched performance was to have multiple streams of completion from the same prompt. Sharing a prompt via system_prompt saves allocating KV. Setting system_prompt at launch...
Port of https://github.com/ggml-org/llama.cpp/pull/11580 I have not used this as I no longer need it ever since the old KV cache is no longer allocated (this helped when both were allocated...