Vasiliy Kuznetsov comments

Results 111 comments of


                                            Vasiliy Kuznetsov

Benchmark quantization

What's the context on which benchmark suite is this and what the goal is? In general more benchmarking of quantized models would be really valuable, let me know how our...

Benchmark quantization

Sounds great. I'd recommend starting with `MobileNetV2` as something easy and which we already benchmark internally, so we can compare data. We can use FX graph mode quant on that...

Benchmark quantization

cc @HDCharles who is interested in testing this out for mobilenetv2

Benchmark quantization

@HDCharles , just so there is nothing blocked, I'd recommend sticking to training and inference only (no calibration) in the first PR. Then, if we decide that calibration is OK...

Benchmark quantization

I'm a little worried about scope creep here before we actually know that this data is reliable. Would it make sense to just do the simplest possible thing first (just...

Benchmark quantization

> @vkuzo I agree with that, iiuc you're referring to (c) in my list? yeah, that sounds great to me. If I had to rank a, b and c by...

Benchmark quantization

> @HDCharles, please confirm if pretrained quantized models have to be calibrated for benchmarking, or if something like #417 would suffice for benchmarking. Thank you! as long as the model...

FP8 rowwise scaling

this is great! API looks good, I'll defer to others for the cutlass part.

[Question] Difference in MXLinear vs MXInferenceLinear grouping direction

Hi @Abhijit-2592 , In `MXInferenceLinear`, the relevant code snippet is: ``` new_mod.weight_mx = MXTensor.to_mx( mod.weight.t().contiguous(), elem_dtype, block_size=block_size ).t() ``` Source: https://github.com/pytorch/ao/blob/26e790df61e23b2ba340c36b84eb9940fec100bb/torchao/prototype/mx_formats/mx_linear.py#L86 There are two calls to `t()` in this snippet...

[Question] Difference in MXLinear vs MXInferenceLinear grouping direction

thanks, will take a look later this week