TensorRT-LLM
TensorRT-LLM copied to clipboard
[Feature Request]: support for vAttention style paging for attention
System Info
Who can help?
@ncomly-nvidia https://arxiv.org/abs/2405.04437
This is currently a PR on vLLM as well. Can we also get it on here too, to support dynamic memory management for the non paged kernel? Thanks
Information
- [ ] The official example scripts
- [ ] My own modified scripts
Tasks
- [ ] An officially supported task in the
examplesfolder (such as GLUE/SQuAD, ...) - [ ] My own task or dataset (give details below)