TensorRT-LLM icon indicating copy to clipboard operation
TensorRT-LLM copied to clipboard

[Feature Request]: support for vAttention style paging for attention

Open thecheekygeek opened this issue 1 year ago • 0 comments

System Info

Who can help?

@ncomly-nvidia https://arxiv.org/abs/2405.04437

This is currently a PR on vLLM as well. Can we also get it on here too, to support dynamic memory management for the non paged kernel? Thanks

Information

  • [ ] The official example scripts
  • [ ] My own modified scripts

Tasks

  • [ ] An officially supported task in the examples folder (such as GLUE/SQuAD, ...)
  • [ ] My own task or dataset (give details below)

Reproduction

Expected behavior

actual behavior

additional notes

thecheekygeek avatar Jul 13 '24 06:07 thecheekygeek