Erin comments

Results 17 comments of


                                            Erin

Feature Request: "Model Zoo" for quantization

Hi @atyshka, TensorRT Model Optimizer team is aware of this and similar requests. We've started planning on publishing quantized checkpoints and the exported models on HuggingFace model hub. If you...

Feature Request: "Model Zoo" for quantization

Hi @atyshka, we have a few llama models like https://huggingface.co/nvidia/Llama-3.1-405B-Instruct-FP8 uploaded, and we're uploading more (e.g., Medusa checkpoint). Legal clearance took a while.

Mixtral-8x7B repetitive answers

Hi @BugsBuggy, We did a reference run using "non TRT-LLM" deployment framework with the same Mixtral-8x7B checkpoints and configs (sampling config, max_output_len, etc) and observed the same repetitive answers as...

Mixtral-8x7B repetitive answers

Hi @xiangxinhello , I tried again w/ `tensorrt-llm 0.11.0` with Mixtra 8x7B and `top_k=0` (minimal value, should be 0 instead of -1) and `top_p=1` and it doesn't have repetitive answer....

Fixed minor typo in advanced docs

This has been merged. Thanks.

[TRTLLM-9144][fix] enhance RPC robustness

To verify the fix, let's see whether we're able to pass CI consistently, which will include Ray stages w/ RPC. But this might be tricky since CI itself is quite...

chore: [TRTLLM-3694] Move functional args to llmargs

/bot run

chore: [TRTLLM-3694] Move functional args to llmargs

/bot run

chore: [TRTLLM-3694] Move functional args to llmargs

/bot run

chore: [TRTLLM-3694] Move functional args to llmargs

/bot run