Haisong Ding

Results 14 comments of Haisong Ding

It would be great to include the **context length** info in vRAM reportings at README for reference.

Any plans to release the dataset?

I also come across the same problem in 3090 GPUs, this has been bugging me for days.

I tried setting the backbone to use FP16, the encoder-decoder part to use FP32. The results roughly match with the FP32 engine. But it is not as fast as the...

> How can you set different precision in different part when creating engine? Can you show me a code example? @HaisongDing For example, in the [detectron2 example](https://github.com/NVIDIA/TensorRT/blob/5f422623e7f5bdc593b781695cbddda99124c9b8/samples/python/detectron2/build_engine.py#L169). Adding something like...

@monsterlyg Update torch to >=1.13.1 to use opset 17 when exporting to onnx. Update tensorrt 8.6.1 to use INormalization layers.

> I tried setting the backbone to use FP16, the encoder-decoder part to use FP32. The results roughly match with the FP32 engine. But it is not as fast as...

I only converted a customized Grounding-DINO model. Also the BERT part is pre-computed in my setups. So only the backbone and encoder-decoder are converted to Tensor-RT. On my customized dataset,...

> [](url) > @Broyojo This is a great question. There is no required "chat format" in the same sense as LLaMA2 where you needed to format your prompt with instruct...

Can any one post the throughput of trt llama v3 models on popular GPUs. Many thanks.