Tommy Yang

Results 5 comments of Tommy Yang

I'm facing a similar issue when inferencing qwen-72B model. The build params used for trt is: ```bash python build.py --hf_model_dir ./Qwen-72B-chat/ \ --dtype float16 \ --remove_input_padding \ --use_gpt_attention_plugin float16 \...

同样的问题,我直接在流式场景下,把 qwen2d5_parser.py 里面的解析逻辑改成,等 `` 存在的时候再做解析,并将 api_server 里面的 tool message is None 情况下直接 continue,暂时解决了这个问题。

Hi, @RunningLeon I just submitted a PR for this issue, plz review

> > @RunningLeon Is it fixed when you supported interns1 reasoning parser? > > The first problem in the below should be fixed. @ywx217 hi, as for the second one,...