KavioYu issues

Results 7 issues of


                                            KavioYu

migrate static api for sensitive demo

[Develop] Performance Improving Feature

I want to develop some features based on Sglang to improve the performance of srt. 1. A new scheduler of ControllerMulti that can more accurately identify the resource utilization of...

## Motivation Accelerate the model inference by speculative inference (EAGLE2). ## Modifications It will be provided soon. ## Checklist - [ ] Format your code according to the [Contributor Guide](https://github.com/sgl-project/sglang/blob/main/docs/en/contributor_guide.md)....

Flex scheduler

## Motivation Implement a better dispatch scheduler for DP mode, which could dispatch new requests depending on the remaining resources of different inference processes. It could help the server get...

[Feature Support] I want to support Native Sparse Attention with Flash Attention V3.

I have developed a Triton-based implementation of [Native Sparse Attention](https://arxiv.org/pdf/2502.11089) in [GitHub](https://github.com/yukavio/nsa) to optimize long-context attention computation. Currently, I want to migrate this implementation to Flash Attention v3 to improve...

Add Implementation of Native Sparse Attention

This PR try to add Implementation of Compressed Attention and Selected Attention of [Native Sparse Attention](https://arxiv.org/pdf/2502.11089) The hyperparameter of selected and compressed attention kernel is setting for good performance on...

KavioYu

migrate static api for sensitive demo

Semi structure

[Develop] Performance Improving Feature

[WIP] Spec infer with EAGLE2

Flex scheduler

[Feature Support] I want to support Native Sparse Attention with Flash Attention V3.

Add Implementation of Native Sparse Attention