顾立辉
顾立辉
Hello, I encountered the same issue, but I now understand the rationale behind this approach. Define a custom KVCache class to enable preallocated GPU memory optimization. During attention computation, when...
I meet the problem same and fix it, see [pull 32](https://github.com/FasterDecoding/REST/pull/32)
I meet the same problem.
I also urgently need this feature. Is anyone currently developing it? If not, I'd like to try implementing it myself.
I've tried, implemented, and tested the feature. Here's my plan. ### Functional Requirements - The system should prioritize evaluating key metrics like accept length, enabling direct validation on datasets without...
https://github.com/sgl-project/SpecForge/pull/279 I prioritize supporting and testing QwenVL models.
https://github.com/sgl-project/SpecForge/pull/279 Hi, it seems this issue is related to the SGLang-side integration. Could you help test this PR? You can evaluate the accept length of Qwen VL 2.5 **without relying...