Dong-Yong Lee issues

Repositories
Issues
Comments

Results 3 issues of


                                            Dong-Yong Lee

Temporary load balancing occurs after sync in canary strategy using traefik.

Checklist: * [ ] I've included steps to reproduce the bug. * [ ] I've inclued the version of argo rollouts. **Describe the bug** We are building a canary deployment...

bug

Support generation from input embedding

This PR implements the feature of generating text from embedding input (popularly known as inputs_embeds). This is related to https://github.com/vllm-project/vllm/issues/369 and https://github.com/vllm-project/vllm/issues/416. More to do - [x] Enhance test codes...

Guides or Tips for optimization for KV cache usage with inflight batcher

Hello, tensorrt-llm team, I have been testing the performance for the combination of int8_kv_cache + weight_only(int8) on the llama-2-7b model. (testing with TensorRT-LLM release v0.7.1) The node contains 2 t4...

triaged