Hongbo Xu issues

Results 5 issues of


                                            Hongbo Xu

AWQ-int4-quantization errors on Llama-2 13B based model with AMMO

version: ``` python3 -c "import tensorrt_llm; print(tensorrt_llm.__version__)" 0.7.1 nvidia-ammo~=0.5.0 ``` I'm currently trying to use AMMO to quantize my model with `awq_int4`. My customed model is **based on llama2-13B,** but...

triaged

[article] update for 04Inference/03Slim/01.introduction.md

add introduction about model compression

[QST] GEMM Epilogue Fusion: Row-wise and Column-wise Multiplication

**What is your question?** Hi, I'd like to compute the following ``` // inputs // A [M, K] int8 // B [N, K] int4 // alphaCol [M, 1] fp32 //...

question

? - Needs Triage

Fail to build w4a8_awq on Llama 13b

### System Info ubuntu 20.04 tensorrt 10.0.1 tensorrt-cu12 10.0.1 tensorrt-cu12-bindings 10.0.1 tensorrt-cu12-libs 10.0.1 tensorrt-llm 0.11.0.dev2024052100 nvidia A100 ### Who can help? @Tracin @byshiue ### Information - [X] The official example...

bug

triaged

stale

waiting for feedback

[QST] How to apply StreamK to hopper warp specialized GEMM

**What is your question?** I'm trying to apply `StreamK` or `SplitK` to a hopper warp specialized GEMM. you can see the full code [here](https://github.com/NVIDIA/cutlass/blob/affd1b693dfc121c51118cbc8583dfd308227ca6/examples/67_hopper_fp8_warp_specialized_gemm_with_blockwise_scaling/67_hopper_fp8_warp_specialized_gemm_with_groupwise_scaling.cu#L168) and the `Gemm` is declared in...

question

? - Needs Triage

inactive-30d

inactive-90d