Anurag Mukkara
Anurag Mukkara
> @amukkara can you sign all of your commits in this PR (starting at [c1b448e](https://github.com/NVIDIA/cuCollections/pull/350/commits/c1b448e12c4e7f82b9fd653fc5dafb981118df85))? Easiest way to do this is do an interactive rebase, pick->edit every commit that is...
`input_embeds` can not be accessed directly. `prompt_table` should be used to pass visual features as input. The specific position of visual features within prompt changes from one model to another....
@Bam4d Logits processor runs only on one MPI node in latest version. you can use LogitsPostProcessorConfig(replicate=True) if you want processor to run all tensor parallel ranks.
there is no need for logits transformation on other ranks in client code. logits callback will be invoked only in first tensor parallel rank. A broadcast will be performed in...
@alexemme Can you try this on latest preview package? Tested this on latest package and engine build runs without error for both `llava-v1.6-mistral-7b-hf` and `llava-v1.6-34b-hf` An earlier version might have...
@Popsicle0-0 `prompt_table` definition depends on the position of special `` tokens in prompt, which is model-specific. The idea is to split input ids into `[pre_text_ids, prompt_table_ids, post_text_ids]`. Some models skip...
We support several popular multimodal models in `examples/multimodal/`. For these models, we pass image embedding input to LLM via `prompt_table` argument (this extends the embedding table of LLM) and modify...
@lss15151161 This example does not support using different prompts in a batch. Yes, the issue is pad tokens will added to end of shorter post_prompt when prompts are different.
/bot reuse-pipeline