HDCharles comments

Results 29 comments of


                                            HDCharles

Benchmark quantization

@wconstab for the quantization benchmarking I'm wondering what a desirable 'scope' would be. The most natural type of quantization to benchmark is QAT, which has both training and evaluation. Then...

Benchmark quantization

Well here is an initial PR, https://github.com/pytorch/benchmark/pull/323 This one is doing C and A.

What is `torch.ops.aten._convert_weight_to_int4pack` ?

you should use the nightly version of torch or at least the recent 2.2 branch cut, its a newish op that was added for int4 support.

[BUG] - segmentation fault occur when follow the tutorial

looks like @ftian1 @holly1238 @yqhu wrote/landed the tutorial, can one of you guys take a look at this? The pytorch quantization oncall is listed for this issue but the tutorial...

Bandwidth achieved for INT8 is much smaller than FP16

the quantization overhead is to blame, at least for the numbers in the README. You're doing the same amount of computation in the matmul but also have to decompress the...

> One question about this @HDCharles. The SpinQuant repo has a dependency on the [CUDA fast Hadamard transform](https://github.com/Dao-AILab/fast-hadamard-transform) package for doing the actual Hadamard transform. Would it be acceptable to...

[WIP] Activation Aware Weight Quantization (AWQ)

this shouldn't be in generate.py, it should be in eval so we can actually see the accuracy impact

Should we require a specific version of `lm_eval` to simplify `torchao/_models/_eval.py`?

we don't really have lm_eval as a dependency so i don't know if pinning it is really the solution here. if you wanted to submit a PR getting rid of...

[wip] SpinQuant

Hey this is looking nice so far, long term we probably want to make these tensor subclasses so that we can make serialization easier. that way rather than having to...