Hongbo Xu

www.leoneo.top [email protected]

@baidu Beijing Per aspera ad astra.

Results 33 comments of


                                            Hongbo Xu

[BUG] ---> 54 query_states, key_states, value_states = torch.split(qkv_states, self.hidden_size, dim=2)

I got the same error, set 'inject_fused_mlp=False' and 'inject_fused_attention=False' works for me. ``` model = AutoGPTQForCausalLM.from_quantized(quantized_model_dir, device="cuda:0", use_triton=False, inject_fused_mlp=False, inject_fused_attention=False) ```

AWQ-int4-quantization errors on Llama-2 13B based model with AMMO

After quantization, I built the model. ``` python build.py --model_dir /target/model/hf_model_v15 \ --quant_ckpt_path /target/model/quantized_int4-awq/llama_tp1_rank0.npz \ --dtype float16 \ --remove_input_padding \ --use_gpt_attention_plugin float16 \ --enable_context_fmha \ --use_gemm_plugin float16 \ --use_weight_only \...

AWQ-int4-quantization errors on Llama-2 13B based model with AMMO

> Hello, have you solved this issue? I also encountered the same issue. I have solved this You should modify func `load_from_awq_llama` in the `weight.py`

这个网页在手机端上的显示不是很好哦，希望改进！

手机端因为屏幕比较小，所以字体不能完整显示，一个简单的解决方法是每次只显示一个字或者两个字。

Add support for mixed 4-bit/8-bit data types GEMM

> > Hi @alexsamardzic, thanks for working on this. Just wanted to clarify, will this kernel support int4 grouped per channel weight quantization + int8 per token dynamic activation quantization?...

Add support for mixed 4-bit/8-bit data types GEMM

> > How can I integrate this PR with PyTorch? Are there any example codes available ? @alexsamardzic > > The primary motivation for this PR is to have this...

Add support for mixed 4-bit/8-bit data types GEMM

> > I'm a beginner with Cutlass, I have on idea how to use my own constructed s4/s8 data to run this GEMM. Could you please provide an example code...

Add support for mixed 4-bit/8-bit data types GEMM

> > I have two s4 values packed in a single byte(uint8). Do I need to unpack the uint8 data to get s4 data before GEMM manually? > > No,...

Add support for mixed 4-bit/8-bit data types GEMM

> > Assuming that A is int8 and (M, K), B is int4 and (K, N), after GEMM:` C = A·B`, and C will be (M, N). Now, I have...

Add support for mixed 4-bit/8-bit data types GEMM

> > Thanks, I’m trying this, but it’s not going well currently. To make it clearer, what I want to do is exactly the following： > > ``` > >...

1
2
3
4
›