Burkhard Ringlein

Results 7 comments of Burkhard Ringlein

Any update on this? Does it depend on `libraw` releasing the GH6 support as open source? (see [here](https://www.libraw.org/node/2710) and [here](https://www.libraw.org/comment/6626#comment-6626))? Thanks!

> However the downsides on `vllm serve` is that every time it sees new `triton.next_power_of_2(max_input_len)`, triton starts autotuning to select the best config. Hi @maleksan85, yes that is indeed the...

I did some benchmarks for different number of concurrent users on A100 and H100 for this PR, using the `benchmarks/benchmark_serving.py` as described by @tdoublep above: ![image](https://github.com/user-attachments/assets/e16971a0-f463-471f-b548-0cd72b00918e) ![image](https://github.com/user-attachments/assets/fc94eb13-ef6b-4681-b241-7535a9f7a150)

just do document: This PR enables similar performance improvement also on MI250 (4.1x) ![image](https://github.com/user-attachments/assets/35fab507-97a5-4319-b077-d8870700cdb1)

Thanks both of you for your feedback! @ThomasRaoux : > it is hard to tell without seeing the code Yes, of course, we are currently working through our internal processes...

We were able to open source our implementation of a dejavu mechanism for triton earlier this week: https://github.com/IBM/triton-dejavu ( :tada: ) The implementation is very similar to the discussed above,...

@ThomasRaoux did you had a chance to look at it?