Dhruv Mullick comments

Results 24 comments of


                                            Dhruv Mullick

FTBart not producing the same results as HuggingFace facebook/bart-large

Seeing the same issue. @byshiue were you able to check?

Improve User Interface

Keeping the geekIT vertically longer, with using either a Symbol, or smaller text, might be good. Similarly placing the Done/NotDone box on the right extreme might look nice, as there's...

Complex beam search

Likewise. Imposing constraints on beam search (like HF's decoding strategies) would be invaluable

SpacyRecognizer: List of Tuples for check label groups argument

Sure, will take this up. @omri374 , can you give me write access for the PR?

[Feature Request] Support for Constrained Decoding (such as generating Json formatted output)

Certainly need this functionality. With vLLM supporting [constrained decoding](https://outlines-dev.github.io/outlines/reference/models/vllm/), this could be a dealbreaker for some for TRT-LLM. Is this on the roadmap by any chance (pinging @ncomly-nvidia in case...

Under the main branch, stress testing the in-flight Triton Server with multiple threads can result in the Triton Server getting stuck.

Facing a similar issue https://github.com/triton-inference-server/tensorrtllm_backend/issues/577 There's no use_custom_all_reduce build option now either, so not sure how to resolve this

Qwen2-72B-Instruct-GPTQ-Int4 Conversion Success, Run Failure

@byshiue is it possible to disable it though? I'm facing similar problems with tp>1 https://github.com/triton-inference-server/tensorrtllm_backend/issues/577

Unable to launch triton server with TP

Even tried without quantization, following the steps given in the [official examples](https://github.com/NVIDIA/TensorRT-LLM/blob/main/examples/llama/README.md) ``` python convert_checkpoint.py --model_dir meta_llama_3_8B_instruct \ --output_dir /tmp/tllm_checkpoint_2gpu_tp2 \ --dtype bfloat16 \ --tp_size 2 trtllm-build --checkpoint_dir /tmp/tllm_checkpoint_2gpu_tp2 \...

Unable to launch triton server with TP

I tried the official image: nvcr.io/nvidia/tritonserver:24.08-trtllm-python-py3 which was launched 2 days back, and built the TRT engines from this Problem remains though, even with reduce_fusion enabled. Logs below: Logs ```...

Unable to launch triton server with TP

@imihic, after spending a week on this, I pivoted to vLLM.