xiangxinhello

Results 27 comments of xiangxinhello

> OK, if you have strong desire to use FP16, I'll continue to look at this issue, if not, this issue will have a lower priority. Hi @sunnyqgg . The...

![sample-500](https://github.com/showlab/Tune-A-Video/assets/169245314/7b9ff89d-5081-48fd-9949-fcf385f27cb2) I think the reconstruction is very poor, is there anything that can be fixed and improved. The main problem is that the reconstruction is a little distorted. Looking forward...

![2024-05-22 10-32-43屏幕截图](https://github.com/Zj-BinXia/SSL/assets/169245314/6a95f20b-f5d6-40d6-816a-2983f5d2f559) ![2024-05-22 10-36-47屏幕截图](https://github.com/Zj-BinXia/SSL/assets/169245314/7294b4b6-c224-4d71-a33f-375844bc1e3d) 我用你们的预训练权重Test,没有看到可视化结果。

> Hi, @xiangxinhello , Could you help to provide your /tmp/Qwen/7B/config.json file? Hi, @Kefeng-Duan { "architectures": [ "Qwen2ForCausalLM" ], "attention_dropout": 0.0, "bos_token_id": 151643, "eos_token_id": 151645, "hidden_act": "silu", "hidden_size": 4096, "initializer_range":...

> @nv-guomingz for vis Hi @nv-guomingz { "architectures": [ "Qwen2ForCausalLM" ], "attention_dropout": 0.0, "bos_token_id": 151643, "eos_token_id": 151645, "hidden_act": "silu", "hidden_size": 4096, "initializer_range": 0.02, "intermediate_size": 11008, "max_position_embeddings": 32768, "max_window_layers": 28, "model_type":...

> I think the modification you applied on `def set_smooth_quant_plugins` doesn't make sense here. Because you convert the model by specifying the quantization mode to weights-only(W8A16) and set_smooth_quant_plugins is a...

> It depends on the case by case. For int8 wo, the matrix mulitiplication uses fp16 is expected behaviour, right? I want to the matrix mulitiplication use int8, but trt-build...

> try below comamnds with latest trtllm again. My local testings is pass. > > ```python > python convert_checkpoint.py > --model_dir /workspace/mnt/storage/trt/Qwen1.5-7B-Chat/ --output_dir ./tllm_checkpoint_1gpu_sq_test --dtype float16 --smoothquant 0.5 > >...

> IIRC, TRTLLM will call smoothquant plugin when you didn't set the --gemm_plugin to None and generate the ckpt with --smoothquant knob. Check [this ](https://github.com/NVIDIA/TensorRT-LLM/blob/main/tensorrt_llm/quantization/quantize.py#L278)for details. So my personal habit...

@jklj077 请问一下qwen2-gptq的量化做了哪些具体的量化,只优化权重weight?