Goldenkoron
Results
2
issues of
Goldenkoron
### Describe the bug I noticed yesterday that tokens per second very heavily gets reduced as context length in the prompt gets larger. I tested Exui with my same conda...
bug
Please add speculative decoding support for Exllamav2. It involves loading a second smaller model to significantly boost inference speed, sometimes even more than 2x token/s boost.
enhancement