Burkhard Ringlein
Burkhard Ringlein
Any update on this? Does it depend on `libraw` releasing the GH6 support as open source? (see [here](https://www.libraw.org/node/2710) and [here](https://www.libraw.org/comment/6626#comment-6626))? Thanks!
> However the downsides on `vllm serve` is that every time it sees new `triton.next_power_of_2(max_input_len)`, triton starts autotuning to select the best config. Hi @maleksan85, yes that is indeed the...
I did some benchmarks for different number of concurrent users on A100 and H100 for this PR, using the `benchmarks/benchmark_serving.py` as described by @tdoublep above:  
just do document: This PR enables similar performance improvement also on MI250 (4.1x) 
Thanks both of you for your feedback! @ThomasRaoux : > it is hard to tell without seeing the code Yes, of course, we are currently working through our internal processes...
We were able to open source our implementation of a dejavu mechanism for triton earlier this week: https://github.com/IBM/triton-dejavu ( :tada: ) The implementation is very similar to the discussed above,...
@ThomasRaoux did you had a chance to look at it?