oldpan
oldpan
Thanks for great work! I'll have a try.
Sorry for that. I didn't think about ipython condition.I only used it in normal python program. Maybe you can fix it or fork this project to finish on you own....
Did your code run with no error ? Just can't output the .txt ? I'm not sure whether your path string is right or not. Can you tell me more...
Hi. That's because we will get some number by MB and 1 MB is 1000\**2 B (or 1024**2 B). You can take a look at [this ](https://oldpan.me/archives/pytorch-gpu-memory-usage-track)~
嗯嗯,感谢回复哈,确实是这样的,如果时间允许的话,直接Tensorrt-API是最好的方法
I think the reason is that, you are using python2.7 where the code 2/4 will be 0(will be 0.5 in python3.6). ... By the way, the code is a mess...
@lvhan028 @lzhangzz 感谢回复,在nougat中,encoder输出的feature会和初始input_ids一同传入decoder中,在docoder内部是这么操作的:  这里有两个kv cache以及两个attn
我也好奇这个input_embeds如何直接传,不确定你这里直接传input_embeds的具体需求是什么,是否和我一样。 不过InternVL2这个可以使用trt-llm跑起来,使用pre + img + post拼prompt的形式。这个token id是在输入trt-llm之前确定好,实际输入trt-llm decoder engine的时候,和图像的visual_feature一起传入decoder engine,input_ids在其中进行embed后和visual_feature一起concat,这个是可以实现的。 I'm also curious about how input_embeds can be directly passed. I'm not sure about the specific requirement for directly passing...
> > 我也好奇这个input_embeds如何直接传,不确定你这里直接传input_embeds的具体需求是什么,是否和我一样。 不过InternVL2这个可以使用trt-llm跑起来,使用pre + img + post拼prompt的形式。这个token id是在输入trt-llm之前确定好,实际输入trt-llm decoder engine的时候,和图像的visual_feature一起传入decoder engine,input_ids在其中进行embed后和visual_feature一起concat,这个是可以实现的。 > > I'm also curious about how input_embeds can be directly passed. I'm not sure about the specific...
> @Oldpan internvl2-2B 跑起来 推理总是输出max_token数,这是为什么 我猜是end_id没设对