Void
Void
> Thus, I assume DRAMA of the int4 first inference should be reduced and TENSO should be around 65% as well as fp16. But there's only 10% reduction in memory...
>do u still have further issue or question now? Thanks, no question.
/bot run --disable-fail-fast
/bot run --disable-fail-fast
/bot run --disable-fail-fast
/bot run --disable-fail-fast
/bot run --disable-fail-fast
/bot reuse-pipeline