Chen Zhang issues

Results 11 issues of


                                            Chen Zhang

震惊！仓库里竟然有commit！

[V1][WIP] 2nd try of Hybrid allocator for full attention & sliding window attention interleaved models

Trying another implementation of #12655

needs-rebase

[v1] Remove bind_kv_cache and self.kv_cache in model runner

As the kv_caches are not needed by model, we can remove it from model runner now and remove the complex bind_kv_cache function. However, as self.kv_caches is still used by tpu...

[V1] Implement sliding window attention in kv_cache_manager

Build on top of https://github.com/vllm-project/vllm/pull/14079, should be merged after it. This pr supports “real” sliding window in v1: 1. Support dropping blocks outside sliding window 2. For prefix caching, only...

needs-rebase

[v1] Implement HybridKVCacheManager to support hybrid models with different KV cache type

WIP https://github.com/vllm-project/vllm/issues/11382

documentation

tpu

[v1] Move block management logic from KVCacheManager to SpecializedManager

Should be merged after https://github.com/vllm-project/vllm/pull/17398 To prepare for hybrid allocator, this PR moves logic that need to run for each specialized manager from KVCacheManager to SpecializedManager. As the `SpecializedManager` not...

[v1] AttentionMetadata for each layer

Should be merge after https://github.com/vllm-project/vllm/pull/17193 This PR changes ForwardContext.attn_metadata from a global one to dict[layer_name, AttentionMetadata] to prepare for hybrid allocator which allocate different block table to sliding window layers...

tpu

ready

needs-rebase

Chen Zhang

震惊！仓库里竟然有commit！

[V1][WIP] 2nd try of Hybrid allocator for full attention & sliding window attention interleaved models

[v1] Remove bind_kv_cache and self.kv_cache in model runner

[V1] Implement sliding window attention in kv_cache_manager

[v1] Implement HybridKVCacheManager to support hybrid models with different KV cache type

[v1] Move block management logic from KVCacheManager to SpecializedManager

[v1] AttentionMetadata for each layer

[v1][Spec Decode] Make sliding window compatible with eagle prefix caching

[v1] Introduce KVCacheBlocks as interface between Scheduler and KVCacheManager

[v1] Pass BlockTable and KVCacheSpec to AttentionMetadataBuilders