Anurag Mukkara comments

Results 16 comments of


                                            Anurag Mukkara

Trie

> @amukkara can you sign all of your commits in this PR (starting at [c1b448e](https://github.com/NVIDIA/cuCollections/pull/350/commits/c1b448e12c4e7f82b9fd653fc5dafb981118df85))? Easiest way to do this is do an interactive rebase, pick->edit every commit that is...

Does TensorRT-LLM support passing input_embeds directly？

`input_embeds` can not be accessed directly. `prompt_table` should be used to pass visual features as input. The specific position of visual features within prompt changes from one model to another....

Concerns about the removal of batch manager C++ API

@Bam4d Logits processor runs only on one MPI node in latest version. you can use LogitsPostProcessorConfig(replicate=True) if you want processor to run all tensor parallel ranks.

Concerns about the removal of batch manager C++ API

there is no need for logits transformation on other ranks in client code. logits callback will be invoked only in first tensor parallel rank. A broadcast will be performed in...

Error trying to build the visual encoder for llava-v1.6-34b-hf using build_visual_engine.py

@alexemme Can you try this on latest preview package? Tested this on latest package and engine build runs without error for both `llava-v1.6-mistral-7b-hf` and `llava-v1.6-34b-hf` An earlier version might have...

How to use prompt_table?

@Popsicle0-0 `prompt_table` definition depends on the position of special `` tokens in prompt, which is model-specific. The idea is to split input ids into `[pre_text_ids, prompt_table_ids, post_text_ids]`. Some models skip...

can i add images embedding to llm input? How can i do it？

We support several popular multimodal models in `examples/multimodal/`. For these models, we pass image embedding input to LLM via `prompt_table` argument (this extends the embedding table of LLM) and modify...

llava batch infer, only the result corresponding to the longest prompt is correct, while other results are incorrect

@lss15151161 This example does not support using different prompts in a batch. Yes, the issue is pad tokens will added to end of shorter post_prompt when prompts are different.

chore: Add second possible output for llava

/bot reuse-pipeline

perf: Optimisations for PP + attention DP

/bot run