DefTruth
DefTruth
> > 我也好奇这个input_embeds如何直接传,不确定你这里直接传input_embeds的具体需求是什么,是否和我一样。 不过InternVL2这个可以使用trt-llm跑起来,使用pre + img + post拼prompt的形式。这个token id是在输入trt-llm之前确定好,实际输入trt-llm decoder engine的时候,和图像的visual_feature一起传入decoder engine,input_ids在其中进行embed后和visual_feature一起concat,这个是可以实现的。 > > > > I'm also curious about how input_embeds can be directly passed. I'm not sure about...
> > > 我也好奇这个input_embeds如何直接传,不确定你这里直接传input_embeds的具体需求是什么,是否和我一样。 不过InternVL2这个可以使用trt-llm跑起来,使用pre + img + post拼prompt的形式。这个token id是在输入trt-llm之前确定好,实际输入trt-llm decoder engine的时候,和图像的visual_feature一起传入decoder engine,input_ids在其中进行embed后和visual_feature一起concat,这个是可以实现的。 > > > I'm also curious about how input_embeds can be directly passed. I'm not sure about...
> > @Oldpan internvl2-2B 跑起来 推理总是输出max_token数,这是为什么 > > 我猜是end_id没设对 我猜你猜得对
> > [@youkaichao](https://github.com/youkaichao) after upgradding nvidia-nccl to v2.25.1, it will report torch v2.5.1 reqiures nvidia-nccl v2.21, seems v2.25.1is incompatible with torch v2.5.1 which is using by vllm? > > you...
same error, I think we will hit this error if prefix cache hit rate > 0. https://github.com/vllm-project/vllm/issues/14009
> IMA currently prefix cache hit requests enter chunked prefill code path (context prefill) but not prepare it properly. Reusing chunked prefill workspace can work around it @ZhongYingMatrix Any PR...
Oh, thanks ~ May be you can send a PR to fix this error !
also see: https://github.com/NVIDIA/TensorRT-Model-Optimizer/issues/38
> can you please provide reproduction instructions if possible? @LucasWilkinson The error is the same as #14069