Geonmin Kim

Results 6 comments of Geonmin Kim

After build wheel & editable install, I still got the same error ``` import tensorrt_llm.bindings as tllm ModuleNotFoundError: No module named 'tensorrt_llm.bindings' ```

@sriyachakravarthy Of course. Please see discussion with similar issue in https://github.com/Nota-NetsPresso/shortened-llm/issues/16.

Oh I notice that optimum-quanto offer HQQ, calibration data-free quantization algorithm and achieves [fairly good perplexity with Shared-llama-1.3B](https://github.com/huggingface/optimum-quanto/pull/128#issue-2200332038). Does HQQ officially supported in optimum-quanto?

Thanks for the detail explanation. > although it includes packing/unpacking code compatible with AWQ weights (so it should be fairly easy to write a conversion script). Could you inform us...

With [this commit you mentioned](https://github.com/VainF/Torch-Pruning/commit/9d1f7401a8559b253e8184a20acfed11a9975b9f), I can still see the error messages for `--pruning_ratio=0.1` `ValueError: hidden_size must be divisible by num_heads (got `hidden_size`: 3224 and `num_heads`: 28).` Here is the...

There seems qat script for llama2-7B. https://github.com/quic/aimet/blob/develop/Examples/torch/quantization/llm_qat_kd/finetune_llm_qat_kd.py I wonder how much quantization loss decreased with this recipe.