Han Guo comments

Results 16 comments of


                                            Han Guo

`hf-bitsandbytes-integration.md` Incorrect Dequantization

Thanks for confirming this! Unfortunately, I'm a bit swamped by an upcoming deadline, so I don't think I could create a PR in the short term :/

[QST] Epilogue Swizzle

Hi, I have a loosely related question about the vectorized Epilogue. What are the general rule of thumb/guideline when configuring the `SmemLayout`, as well as the tiled copy between Smem...

[QST] StreamK ReductionStrategy: "Atomic" or "Mixed"

Thanks for the answer! I noticed a similar decision is made in the [Hopper implementation](https://github.com/NVIDIA/cutlass/blob/main/include/cutlass/gemm/kernel/tile_scheduler_params.h#L448) of StreamK. If I understand correctly of what you said, most of these choices are...

Subject: Request for vocab.3000 file and Complete Dataset for Attack Experiments in multinli.py

Thanks for reaching out! Please check out a related question at https://github.com/HanGuo97/soft-Q-learning-for-text-generation/issues/2. Somewhat related, we also have a [follow-up work](https://github.com/mingkaid/rl-prompt) with better documentation.

[QST] GEMM Epilogue Fusion: Element-wise Ops and Two-Tensor Element-wise Multiplication

I'm looking at Ampere chips.

[QST] GEMM Epilogue Fusion: Element-wise Ops and Two-Tensor Element-wise Multiplication

I believe element-wise function is supported, but I was wondering whether element-wise multiplication with another tensor is supported. AFAIK, element-wise multiplication with another scalar or another vector is supported, but...

[QST] GEMM Epilogue Fusion: Element-wise Ops and Two-Tensor Element-wise Multiplication

Thanks for the quick response! A few quick questions: 1. Is such fusion "profitable" (element-wise activation, and another element-wise multiplication with a different tensor)? I'd imagine this is a somewhat...

Support 3-bit and 2-bit quantization with the FLUTE kernel.

Thanks for bringing that up @radi-cho and @casper-hansen! I agree that 2-bit is a bit too "aggressive" to be useful in practice. That being said, many of the ongoing research...

Support 3-bit and 2-bit quantization with the FLUTE kernel.

Just want to make sure I understand the question. Are you talking about _algorithms_ for, say, 3-bit quantization, or _fused implementations_ of it?

FLUTE Integration for Fast Inference

Thanks! Do you have suggestions on where we could get started? I was thinking about a few potential ways for integration: 1. The easiest point of integration is at the...