Goldenkoron

Results 2 issues of Goldenkoron

### Describe the bug I noticed yesterday that tokens per second very heavily gets reduced as context length in the prompt gets larger. I tested Exui with my same conda...

bug

Please add speculative decoding support for Exllamav2. It involves loading a second smaller model to significantly boost inference speed, sometimes even more than 2x token/s boost.

enhancement