顾立辉

Results 7 comments of 顾立辉

Hello, I encountered the same issue, but I now understand the rationale behind this approach. Define a custom KVCache class to enable preallocated GPU memory optimization. During attention computation, when...

I meet the problem same and fix it, see [pull 32](https://github.com/FasterDecoding/REST/pull/32)

I also urgently need this feature. Is anyone currently developing it? If not, I'd like to try implementing it myself.

I've tried, implemented, and tested the feature. Here's my plan. ### Functional Requirements - The system should prioritize evaluating key metrics like accept length, enabling direct validation on datasets without...

https://github.com/sgl-project/SpecForge/pull/279 I prioritize supporting and testing QwenVL models.

https://github.com/sgl-project/SpecForge/pull/279 Hi, it seems this issue is related to the SGLang-side integration. Could you help test this PR? You can evaluate the accept length of Qwen VL 2.5 **without relying...