Hongbo Xu

Results 5 issues of Hongbo Xu

version: ``` python3 -c "import tensorrt_llm; print(tensorrt_llm.__version__)" 0.7.1 nvidia-ammo~=0.5.0 ``` I'm currently trying to use AMMO to quantize my model with `awq_int4`. My customed model is **based on llama2-13B,** but...

triaged

add introduction about model compression

**What is your question?** Hi, I'd like to compute the following ``` // inputs // A [M, K] int8 // B [N, K] int4 // alphaCol [M, 1] fp32 //...

question
? - Needs Triage

### System Info ubuntu 20.04 tensorrt 10.0.1 tensorrt-cu12 10.0.1 tensorrt-cu12-bindings 10.0.1 tensorrt-cu12-libs 10.0.1 tensorrt-llm 0.11.0.dev2024052100 nvidia A100 ### Who can help? @Tracin @byshiue ### Information - [X] The official example...

bug
triaged
stale
waiting for feedback

**What is your question?** I'm trying to apply `StreamK` or `SplitK` to a hopper warp specialized GEMM. you can see the full code [here](https://github.com/NVIDIA/cutlass/blob/affd1b693dfc121c51118cbc8583dfd308227ca6/examples/67_hopper_fp8_warp_specialized_gemm_with_blockwise_scaling/67_hopper_fp8_warp_specialized_gemm_with_groupwise_scaling.cu#L168) and the `Gemm` is declared in...

question
? - Needs Triage
inactive-30d
inactive-90d