Alex Chen comments

Results 4 comments of


                                            Alex Chen

[Bug]: vLLM MQLLMEngine Timeout - Json Schema

Oh, I have the similar issue here: INFO 10-12 22:21:19 metrics.py:351] Avg prompt throughput: 0.0 tokens/s, Avg generation throughput: 27.7 tokens/s, Running: 1 reqs, Swapped: 0 reqs, Pending: 0 reqs,...

[Bug]: vLLM MQLLMEngine Timeout - Json Schema

> This is probably related: #9032 > > The guided decoding is super slow, and seems to block up the engine so that it can't report its health status yes,...

llama3:8b-instruct performs much worse than llama3-8b-8192 on groq

I meet similar situation here, the smartness and accuracy of groq's llama3-70b-8192 model is much better than my llama3:70b-instruct-fp16 power by ollama. I don't have any clue about why. I...

llama3:8b-instruct performs much worse than llama3-8b-8192 on groq

> @alexchenyu How large are your prompts? Ours are around 3.5K. My prompt is quite long, over 4k, I thinks maybe that's the reason, and after I switch to vLLM,...