valtab

Results 6 comments of valtab

Link GORM related issue which was resolved by GORM author gorm v2.0 unit testing with sqlmock: https://github.com/go-gorm/gorm/issues/3565

> > @void-main Hello,i found a bug that after multiple (thousands of) batch(20) inference, some batches may output randomly. But if the triton service is restarted, it can be inferred...

+1 ```python triton_client.async_stream_infer( model_name=model_name, inputs=inputs, request_id=uuid.uuid4().hex, # no issue if don't specify request_id here enable_empty_final_response=True, timeout=stream_timeout, ) ``` ```bash [TensorRT-LLM][INFO] Allocate 4034920448 bytes for k/v cache. [TensorRT-LLM][INFO] Using 61568 total...

It seems this issue doesn't get resolved in V1 + APC + FA still

+1 issue here, we have simillar issue on production, about 100 QPS here FastAPI + serve deployment(replica num=1) same scenario: AsyncGenerator + StreamResponse

Excuse me, is there plan for flashinfer backend? Thanks~