DefTruth comments

Results 256 comments of


                                            DefTruth

Does TensorRT-LLM support passing input_embeds directly？

> > 我也好奇这个input_embeds如何直接传，不确定你这里直接传input_embeds的具体需求是什么，是否和我一样。不过InternVL2这个可以使用trt-llm跑起来，使用pre + img + post拼prompt的形式。这个token id是在输入trt-llm之前确定好，实际输入trt-llm decoder engine的时候，和图像的visual_feature一起传入decoder engine，input_ids在其中进行embed后和visual_feature一起concat，这个是可以实现的。 > > > > I'm also curious about how input_embeds can be directly passed. I'm not sure about...

Does TensorRT-LLM support passing input_embeds directly？

> > > 我也好奇这个input_embeds如何直接传，不确定你这里直接传input_embeds的具体需求是什么，是否和我一样。不过InternVL2这个可以使用trt-llm跑起来，使用pre + img + post拼prompt的形式。这个token id是在输入trt-llm之前确定好，实际输入trt-llm decoder engine的时候，和图像的visual_feature一起传入decoder engine，input_ids在其中进行embed后和visual_feature一起concat，这个是可以实现的。 > > > I'm also curious about how input_embeds can be directly passed. I'm not sure about...

DefTruth

Does TensorRT-LLM support passing input_embeds directly？

Does TensorRT-LLM support passing input_embeds directly？

Does TensorRT-LLM support passing input_embeds directly？

[Bug]: deepseek-r1 mutlti-node crash

[Usage]:Input prompt (2501 tokens) is too long and exceeds limit of 2048

[Bug]: Runtime error when running MLA models with "prefix caching enabled" and "chunked prefill disabled"

[Bug]: Runtime error when running MLA models with "prefix caching enabled" and "chunked prefill disabled"

[Bug]: Runtime error when running MLA models with "prefix caching enabled" and "chunked prefill disabled"

Error Code: 10: Could not find any implementation for node /Concat_17slice

[Bug]: AssertionError, assert prefill_metadata.context_chunk_seq_tot is not None