LMM issues

Results 6 issues of

LMM

QAT 支持输入多输出吗？

多输入为有不同的shape，2/3/4维都有，请问这个QAT范式支持吗？如果支持的话，在保存模型前的一次前向推理应该怎么写呢？如官方给出的example： predict = net.forward(F.placeholder([1, 3, 224, 224], F.NC4HW4)) 那么请问我这个多输入的应该怎么写吗？我尝试按照net的输入顺序一次填入，执行失败了(没有报错)，结果出现很多print出来是很多array～，求教

General_TinyBERT(4layer-312dim)/General_TinyBERT(6layer-768dim) 模型下载链接失效？

您好，感谢你的分享，我现在想复现您的工作，但是目前v1/v2两个版本的4个模型下载链接都失效，请问你是更新了地址吗？还是？求正确的链接～

can you show the bleu for this repo on WMT14 dataset?

hi! thanks for your repo,can you show the bleu for this repo on WMT14 dataset?

What's the feature of version 2.x?

question

Regarding the GPU memory usage and inference speed issues of the qwen2 0.5b model

cpu: x86_64 gpu: nvidia H20 cuda version： 12.4 tensorrt-llm version： 0.14.0 I follow https://github.com/NVIDIA/TensorRT-LLM/blob/main/examples/qwen/README.md running qwen2 0.5B model， The results I obtained are as follows： * case0 with code: *...

CUDA 12.2 and would like to know the highest version？

**Description** I am using CUDA 12.2 and would like to know the highest version of TensorRT-LLM that I can install. Can you please provide information on compatibility? **My Environment:** -...

question

not a bug