valtab comments

Results 6 comments of


                                            valtab

go-sqlmock with gorm

Link GORM related issue which was resolved by GORM author gorm v2.0 unit testing with sqlmock: https://github.com/go-gorm/gorm/issues/3565

[enhancement] support llama

> > @void-main Hello，i found a bug that after multiple (thousands of) batch（20） inference, some batches may output randomly. But if the triton service is restarted, it can be inferred...

Got error from grpc: requestId 8 is already in progress, request is ignored

+1 ```python triton_client.async_stream_infer( model_name=model_name, inputs=inputs, request_id=uuid.uuid4().hex, # no issue if don't specify request_id here enable_empty_final_response=True, timeout=stream_timeout, ) ``` ```bash [TensorRT-LLM][INFO] Allocate 4034920448 bytes for k/v cache. [TensorRT-LLM][INFO] Using 61568 total...

[Bug]: prefix-caching: inconsistent completions

It seems this issue doesn't get resolved in V1 + APC + FA still

[Serve][Streaming] Intermediate deployment throughput drops when serving concurrent users.

+1 issue here, we have simillar issue on production, about 100 QPS here FastAPI + serve deployment(replica num=1) same scenario: AsyncGenerator + StreamResponse

LMCache Q2 Roadmap

Excuse me, is there plan for flashinfer backend? Thanks~