Zejia LIN
Zejia LIN
I've got this out! It's because `not` operator has higher priority than `instanceof`. Instead of removing the `if-statement`, simply add a pair of parentheses whenever `if (!elem._parent instanceof type.UMLInterface) `...
Thanks! I am using A100, both cutlass 2.x and 3.x is suitable for me. The data type of A and B are `int8` with `int32` accumulation, C and D are...
Thank you for the detailed reply. I'll try it later.
I am sorry for the late response. I found I was not able to resolve it under reasonable efforts. I am closing this issue.
@zhaochenyang20 I am sorry I misunderstood the OpenAI's API. The [standard API](https://platform.openai.com/docs/api-reference/chat/create#chat-create-stream_options) defines only an `include_usage` field in the `stream_options`, which returns the token usage statistics in an additional streaming...
@zhaochenyang20 SGLang already implements the first one.
Sorry for the late response. Your figures aligns with my observations. Before ~40 Streaming Multiprocessors (SMs), I believe the memory bandwidth is not yet saturated, yielding the rapid performance rise....