KIUCHO comments

Repositories
Issues
Comments

Results 2 comments of


                                            KIUCHO

[BUG] Int8 Inference Does Not Work For GPTJ

Still can't use `replace_with_kernel_inject=True` and 'dtype=torch.int8` at the same time.. Is there any progress??

How to measure the speedup of W4A16 kernel like Figure 6？

For real quantization, I figured out that it reduces memory but much slower than FP16 inference(pseudo quantization) even when the quantized module has been changed to WQLinear module which need...