Ouna-the-Dataweaver
Ouna-the-Dataweaver
I'm either going insane, but with V1 qwen 8b instruct LLM just breaks in fp8 and around 25% of generations are just gibberish, with same running code and everything. Do...
I cloned this using `gh pr checkout 6869` on latest vllm and looks like there's a bug? Basically, inputs processing is broken. When I add `print(f'inputs {inputs}\n preprocessed_inputs {preprocessed_inputs} \n...
I know that this is still in the works, but I tried it before recent merges and after, and both times I got some errors of more or less the...
Oh, I found mistake I made. Basically, this MR expects embeds as tensor without batch, but transformers llm use batched input. ``` print(f'embeddings shape: {embeds.shape}') output = self.llama_model.generate( inputs_embeds=embeds,... )...