Lyu Han
Lyu Han
**Describe the feature** As described in the title, this feature requests developing NetModule by wrapping libtorch in MMDeploy SDK. Therefore, we can do torchscript model inference via SDK API **Motivation**...
**Describe the feature** Write a script to build mmdeploy under windows platform. **Motivation** make windows build easier.
Thanks for your contribution and we appreciate it a lot. The following instructions would make your pull request more healthy and more easily receiving feedbacks. If you do not understand...
Thanks for your contribution and we appreciate it a lot. The following instructions would make your pull request more healthy and more easily receiving feedbacks. If you do not understand...
Thanks for your contribution and we appreciate it a lot. The following instructions would make your pull request more healthy and more easily get feedback. If you do not understand...
It is noted that some issues(#506 #729 #727) are requesting FasterTransformer to support Llama and Llama-2. Our project [LMDeploy](https://github.com/InternLM/lmdeploy) developed based on FasterTransformer, has supported them and their derived models,...
after #1507 addressing issue #1494 python >= 3.10 is recommended.
# 背景 我们发现绝大部分LLM推理引擎在报告推理性能的时候,都是关掉sampling功能的。但是在实际应用中,sampling几乎是必选项。为了给出尽可能贴近实际应用的benchmark,我们开了这个issue,报告 LMDeploy **在采样开启时**候的性能。 # 测试模型 1. llama2-7b 2. llama2-13b 3. internlm-20b 4. llama2-70b # 测试设备 1. A100 模型计算精度:BF16(FP16)、W4A16、KV8 2. V100 模型计算精度:FP16 4. 4090 模型计算精度:W4A16 5. 3090 模型计算精度:W4A16 7....
Thanks for your contribution and we appreciate it a lot. The following instructions would make your pull request more healthy and more easily get feedback. If you do not understand...