zhaochaochao comments

Results 46 comments of


                                            zhaochaochao

classes_names_zh文件可以看下吗?

这个是准备数据的时候生成的英文版文件，然后用google翻译的，每次生成顺序会不一致，可以看下下面链接里的准备数据步骤。 https://github.com/zhaocc1106/machine_learn/tree/master/NeuralNetworks-tensorflow/RNN/quick_draw

Cannot support long context input vs vllm.

Any update?

Cannot support long context input vs vllm.

> Hi, @zhaocc1106 , could you update to the latest trtllm version? The same issue with v0.11.0: ``` trtllm-build --checkpoint_dir /data/docker_ceph/llm/Qwen2-7B-Instruct/tllm_checkpoint_4gpu_tp4/ --output_dir /data/docker_ceph/llm/Qwen2-7B-Instruct/trt_engines/fp16_4gpu/ --gemm_plugin float16 --context_fmha disable --use_custom_all_reduce disable --max_batch_size...

Cannot support long context input vs vllm.

> @zhaocc1106 Could you double check that you have successfully rebuilded and reinstalled the v0.11.0, I think we have remove '--use_custom_all_reduce' knob from build flow and you will get an...

Cannot support long context input vs vllm.

> @zhaocc1106 could you try to enable --context_fmha? When i use v0.11.0, 4 x 2080Ti gpu, max_batch_size 1, max_input_len 32k, as following cmd: ``` trtllm-build --checkpoint_dir /tmp/Qwen2-7B-Instruct/tllm_checkpoint_4gpu_tp4/ \ --output_dir /tmp/Qwen2-7B-Instruct/trt_engines/fp16_4gpu/...

Cannot support long context input vs vllm.

> [@zhaocc1106](https://github.com/zhaocc1106) right, we don't support 2080Ti. Alright , close the issue.

support Qwen2-VL

Any updates?

support Qwen2-VL

I found reduced accuracy and output error when two pics with qwen2-vl-7B, but one pic is ok. Also, i found the performance is lower than vllm. message: ``` min_pixels =...

support Qwen2-VL

> I found reduced accuracy and output error when two pics with qwen2-vl-7B, but one pic is ok. Also, i found the performance is lower than vllm. > > message:...

support Qwen2-VL

> Hi, please use the latest code which is public today, and for multi-batch accuracy please change the the attention_mask_vit in tensorrt_llm/runtime/multimodal_model_runner.py, > > ``` > attention_mask_vit = torch.full([1, seq_length,...