Lei

Results 2 issues of Lei

The model engine is built from llama 3 70b with tensor parallelism tp=2 and pp=2 and deployed by below triton launch script: python3 scripts/launch_triton_server.py --world_size 4 --model_repo=llama_ifb In this case,...

**Is your feature request related to a problem? Please describe.** A clear and concise description of what the problem is. Ex. I'm always frustrated when [...] As per the quick...

Question
New feature