Fanhai Lu

Results 9 comments of Fanhai Lu

Thanks @imsujinpark! I got same issues, after switching it to the release version (v.0.109.0), I can connect my vms.

More logs after skip zero output: only 2 of 300 had zero length -------- output_len is zero for 238th request -------- output_len is zero for 288th request

> > Any reason to add text back, I suggested we keep both str and id in response in #40. The answer is " don't want to decode it to...

> > Any reason to add text back, I suggested we keep both str and id in response in #40. The answer is " don't want to decode it to...

> > > > Any reason to add text back, I suggested we keep both str and id in response in #40. The answer is " don't want to decode...

> * When input text, return both text and token ids. Is it still a streaming mode?

> * Optimized TPU duty cycle (largest gap < 4ms) > * Optimized TTFT: dispatch prefill tasks ASAP w/o unnecessary blocking in CPU, keep backpressure to enforce insert ASAP, return...

Hi [richard](https://github.com/richardsliu), I tested the llama-2 7B with run_server_with_ray.py (--batch_size=32). Instead of sent request one by one, I use benchmark script to send 200 request and got 198 response back....

@qihqi @wang2yn84 Let's revisit this issue now. Having regression test is critical to catch the performance degradation. @sixiang-google Since the infra is ready, could you work on regression test for...