Harahan

Results 11 comments of Harahan

The following process needs to modify some codes to change the default settings in TensorRT-LLM. To help users use our tool more conveniently, we are rushing an official doc page...

There's no ``get_act_qparams()`` in ``rtn.py``. You can print the bit-width of each linear to check the code. PS: This function hasn't update for a long time. If you confirm there's...

It depends on whether we encounter such a need or it will be used in our research. So, not sure.

I'm sorry, but we do not have enough time to do this. If you are sure there's a bug, post the log/evidence and reopen the issue.

I reopen the issue. Since we currently don't have the requirement for the static quantization, the bug may be fixed a long time later. You'd best try other settings.

You need to use symmetric mode for awq, if you want to do inference with lightllm. Additionally, remove the act part.

You must still use per-group quantization with 128 group size in llmc to fit the backend kernel.

Did you get the error with lightllm quantization mode after fixing the config for llmc? If you do not use quantization kernel, weight clipping in awq makes this reasonable.