Han Guo

Results 16 comments of Han Guo

Thanks for confirming this! Unfortunately, I'm a bit swamped by an upcoming deadline, so I don't think I could create a PR in the short term :/

Hi, I have a loosely related question about the vectorized Epilogue. What are the general rule of thumb/guideline when configuring the `SmemLayout`, as well as the tiled copy between Smem...

Thanks for the answer! I noticed a similar decision is made in the [Hopper implementation](https://github.com/NVIDIA/cutlass/blob/main/include/cutlass/gemm/kernel/tile_scheduler_params.h#L448) of StreamK. If I understand correctly of what you said, most of these choices are...

Thanks for reaching out! Please check out a related question at https://github.com/HanGuo97/soft-Q-learning-for-text-generation/issues/2. Somewhat related, we also have a [follow-up work](https://github.com/mingkaid/rl-prompt) with better documentation.

I believe element-wise function is supported, but I was wondering whether element-wise multiplication with another tensor is supported. AFAIK, element-wise multiplication with another scalar or another vector is supported, but...

Thanks for the quick response! A few quick questions: 1. Is such fusion "profitable" (element-wise activation, and another element-wise multiplication with a different tensor)? I'd imagine this is a somewhat...

Thanks for bringing that up @radi-cho and @casper-hansen! I agree that 2-bit is a bit too "aggressive" to be useful in practice. That being said, many of the ongoing research...

Just want to make sure I understand the question. Are you talking about _algorithms_ for, say, 3-bit quantization, or _fused implementations_ of it?

Thanks! Do you have suggestions on where we could get started? I was thinking about a few potential ways for integration: 1. The easiest point of integration is at the...