Void comments

Results 28 comments of


                                            Void

[Quantization] Long latency for generating first token

> Thus, I assume DRAMA of the int4 first inference should be reduced and TENSO should be around 65% as well as fp16. But there's only 10% reduction in memory...

[Quantization] Long latency for generating first token

>do u still have further issue or question now? Thanks, no question.

feat: allreduce and fusion kernel development

/bot run --disable-fail-fast

feat: allreduce and fusion kernel development

/bot run --disable-fail-fast

feat: allreduce and fusion kernel development

/bot run

feat: allreduce and fusion kernel development

/bot run --disable-fail-fast

feat: allreduce and fusion kernel development

/bot run

feat: allreduce and fusion kernel development

/bot run --disable-fail-fast

feat: allreduce and fusion kernel development

/bot run --disable-fail-fast

feat: allreduce and fusion kernel development

/bot reuse-pipeline