SpecForge
SpecForge copied to clipboard
Train speculative decoding models effortlessly and port them smoothly to SGLang serving.
### Checklist - [x] 1. I have searched related issues but cannot get the expected help. - [x] 2. The bug has not been fixed in the latest version. -...
after this pr https://github.com/sgl-project/SpecForge/pull/290 fixed VLM's lack of support for set_aux_hidden_states_layers, but after this pr https://github.com/sgl-project/SpecForge/pull/308 This part of the code has been removed, which will cause bugs in VLM.
## Motivation ## Modifications ## Related Issues ## Accuracy Test ## Benchmark & Profiling ## Checklist - [ ] Format your code according to the [Code Formatting with Pre-Commit](https://docs.sglang.ai/references/contribution_guide.html#code-formatting-with-pre-commit). -...
Why is the draft_vocab_size in configs/qwen3-30B-A3B-eagle3.json 32,000 instead of the vocab_size 151,936 in qwen3-30B-A3B?
I got a throughput degrade when I try to EAGLE3 to speed up Qwen3-30B-A3B (in H100*2). My draft model is download from: [https://huggingface.co/zhuyksir/EAGLE3-Qwen3-30B-A3B-DenseHead](url) The command I using to benchmark: ```...
@Abigbigbig This looks like a different issue from this PR. Let's move to a different issue. I can point you the fix _Originally posted by @yubofredwang in https://github.com/sgl-project/SpecForge/issues/314#issuecomment-3588281081_ Thank you...
Can the 48G A6000 GPU be used for draft training of the qwen2.5-vl-7b model? Will there be OOM?
### Checklist - [x] 1. I have searched related issues but cannot get the expected help. - [x] 2. The bug has not been fixed in the latest version. -...
## Motivation The two existing attention backends both exhibit inefficiencies which inhibit the training experience. - `sdpa` backend materializes the full `bsz x num_heads x q_len x kv_len` attention score...
## Motivation ** **Draft PR. This is currently WIP** ** Add eagle3 support for qwen3_vl and qwen3_vl_moe models. ## Modifications ## Related Issues ## Accuracy Test ## Benchmark & Profiling...