Estella comments

Repositories
Issues
Comments

Results 5 comments of


                                            Estella

[BUG] <unauthorized: authentication required>

> 问题解决，是因为NTP服务不准时导致，同步一下NTP，问题得到解决同步了依然存在问题也是centos7

Does TensorRT-LLM support passing input_embeds directly？

@Oldpan internvl2-2B 跑起来推理总是输出max_token数，这是为什么

Does TensorRT-LLM support passing input_embeds directly？

end_id =2没错的经测试，run.py --stop_words "" 可以解决

Please make it clear in the install guide it doesn't work for sm_75 GPUs yet

> @Amrabdelhamed611 the first error was not reported by flashinfer, that might be related to flashattn package and flashinfer doesn't depend on that. > > Regarding the second issue, check...

Question about convert Qwen2-7B

I got the same error. nvcr.io/nvidia/tritonserver:24.03-py3 tensorrt-llm 0.10.0 Qwen2-1.5B-instruct