KIUCHO
Results
2
comments of
KIUCHO
Still can't use `replace_with_kernel_inject=True` and 'dtype=torch.int8` at the same time.. Is there any progress??
For real quantization, I figured out that it reduces memory but much slower than FP16 inference(pseudo quantization) even when the quantized module has been changed to WQLinear module which need...