Harahan comments

Results 11 comments of


                                            Harahan

How to use Tensorrt-LLM as backend

The following process needs to modify some codes to change the default settings in TensorRT-LLM. To help users use our tool more conveniently, we are rushing an official doc page...

LLama3-8B-Instruct fail for TensorRT-LLM

We will fix this later.

Mixtral 8x7b failed on compile with tensorrt-llm

We will fix this later.

BUG: Mixed-precision configuration not working with STATIC quantization

There's no ``get_act_qparams()`` in ``rtn.py``. You can print the bit-width of each linear to check the code. PS: This function hasn't update for a long time. If you confirm there's...

BUG: Mixed-precision configuration not working with STATIC quantization

It depends on whether we encounter such a need or it will be used in our research. So, not sure.

BUG: Mixed-precision configuration not working with STATIC quantization

I'm sorry, but we do not have enough time to do this. If you are sure there's a bug, post the log/evidence and reopen the issue.

BUG: Mixed-precision configuration not working with STATIC quantization

I reopen the issue. Since we currently don't have the requirement for the static quantization, the bug may be fixed a long time later. You'd best try other settings.

fail to start awq quantized model with lightllm on qwen2-7b-instruct

You need to use symmetric mode for awq, if you want to do inference with lightllm. Additionally, remove the act part.

fail to start awq quantized model with lightllm on qwen2-7b-instruct

You must still use per-group quantization with 128 group size in llmc to fit the backend kernel.

fail to start awq quantized model with lightllm on qwen2-7b-instruct

Did you get the error with lightllm quantization mode after fixing the config for llmc? If you do not use quantization kernel, weight clipping in awq makes this reasonable.