KavioYu

Results 7 issues of KavioYu

migrate static api for sensitive demo

add semi-structure pruning demo

I want to develop some features based on Sglang to improve the performance of srt. 1. A new scheduler of ControllerMulti that can more accurately identify the resource utilization of...

## Motivation Accelerate the model inference by speculative inference (EAGLE2). ## Modifications It will be provided soon. ## Checklist - [ ] Format your code according to the [Contributor Guide](https://github.com/sgl-project/sglang/blob/main/docs/en/contributor_guide.md)....

## Motivation Implement a better dispatch scheduler for DP mode, which could dispatch new requests depending on the remaining resources of different inference processes. It could help the server get...

I have developed a Triton-based implementation of [Native Sparse Attention](https://arxiv.org/pdf/2502.11089) in [GitHub](https://github.com/yukavio/nsa) to optimize long-context attention computation. Currently, I want to migrate this implementation to Flash Attention v3 to improve...

This PR try to add Implementation of Compressed Attention and Selected Attention of [Native Sparse Attention](https://arxiv.org/pdf/2502.11089) The hyperparameter of selected and compressed attention kernel is setting for good performance on...