TensorRT-LLM

TensorRT-LLM copied to clipboard

Reame
Issues

[Feature Request]: support for vAttention style paging for attention

Open thecheekygeek opened this issue 1 year ago • 0 comments

System Info

Who can help?

@ncomly-nvidia https://arxiv.org/abs/2405.04437

This is currently a PR on vLLM as well. Can we also get it on here too, to support dynamic memory management for the non paged kernel? Thanks

Information

[ ] The official example scripts
[ ] My own modified scripts

Tasks

[ ] An officially supported task in the examples folder (such as GLUE/SQuAD, ...)
[ ] My own task or dataset (give details below)

Reproduction

Expected behavior

actual behavior

additional notes

Jul 13 '24 06:07 thecheekygeek