coder4nlp comments

Repositories
Issues
Comments

Results 23 comments of


                                            coder4nlp

The inference time of qwen2.5-vl is very slow.

@kzjeef This is my test result. Based on the experimental results, dashinfer still has a gap in QPS compared to vllm.Could you please tell me what I should do to...

The inference time of qwen2.5-vl is very slow.

@kzjeef Strangely enough, dashinfe seems to be unstable. Dashinfer is performing much more slowly today than yesterday. I have no idea what the reason is.

The inference time of qwen2.5-vl is very slow.

@kzjeef When using Qwen2.5-VL-3B-Instruct with --enable-prefix-cache, an error occurs. ``` File "dash-infer/multimodal/dashinfer_vlm/api_server/server.py", line 684, in main init() File "dashinfer_vlm/api_server/server.py", line 143, in init vl_engine = QwenVl( File "dash-infer/multimodal/dashinfer_vlm/vl_inference/runtime/qwen_vl.py", line 231,...