supriyar comments

Results 18 comments of


                                            supriyar

Does torch.export preserve the quantize_per_tensor/dequantize_per_tensor ops?

@jerryzh168 we have a way to preserve these ops in export, right?

[BUG] Float8Linear does not work with torch.inference_mode

cc @vkuzo, @drisspg

[Feature Request] Fused fp8 matmul kernel (quant + dequant + matmul)

cc @danielvegamyhre

What kind of layers are optimized by torchao on a RTX 4090?

torchao quantizes Linear layers. However depending on batch-size and layer shape you may see different levels of performance improvements for different techniques. Eg. weight-only works best for bs=1 while dynamic...

Autoquant fails on CPU with CPU packages

Hi @ynimmaga, nice to see you here - I believe we met last year at the PTC poster sessions and after to discuss how to use PyTorch quantization with OpenViNO....

Autoquant fails on CPU with CPU packages

@ynimmaga what kind of use cases do you have in mind? We won't have the bandwidth to support OpenVINO specifically but if that's something your team would like to contribute...

[Quantization + FSDP] Support `quantize_()` for DTensor

@jerryzh168 @kwen2501 is this addressed now with quantize + distributed inference composability work?

What is the difference between WeightNormSparsifier and torch.nn.utils.prune.l1_unstructured ?

cc @jcaip

Does torchao support FP8 Grouped GEMM?

cc @HDCharles who has been looking into MoE quantization and grouped gemm recently

Future plans for MXFP8 development

I believe https://github.com/pytorch/ao/issues/2147 has some details, but is likely not the full list cc @danielvegamyhre @vkuzo