IAN
IAN
The weight of this model is saved as external_data, but the problem is still exists。
Does this code need to differentiate whether to enable dpattention. 
CMD to reproduce this error python3 -m sglang.launch_server --model /mnt/DeepSeek-R1 --tp 8 --trust-remote-code --enable-dp-attention ``` import json json_schema = json.dumps( { "type": "object", "properties": { "name": {"type": "string", "pattern": "^[\\w]+$"},...
> [@hcyz33](https://github.com/hcyz33) I don't think this is due to constraint decoding. Could you check it for several times? Also, [@FrankLeeeee](https://github.com/FrankLeeeee) could you take a look? Thanks!  I added a...
I found that the reason of hang is that the sampling_info_done will never receive the signal. It will wait here until timeout.  It seems the root cause is that...
I add an event set at idle batch as below. The hanging issue has disappeared. However, it seems that the structured output of some requests is not taking effect. Further...
I forgot to update_regex_vocab_mask. I added it before sync. The results are all correct now! I Think that i have fixed it.  
> Hi [@hcyz33](https://github.com/hcyz33) , do you want to create a PR to fix this? Yes,I will.
> ## Motivation > * Support double sparsity (post-training sparse attention) for long context inference in SGLang > * See [paper](https://arxiv.org/pdf/2408.07092) > > ## Modifications > * Add triton implementation...
I hit the same bug when enable NEXTN 