DefTruth
DefTruth
command: ```bash export CUDA_VISIBLE_DEVICES=0,1,2,3,4,5,6,7 python3 -m vllm.entrypoints.openai.api_server \ --model Qwen1.5-72B-Chat \ --tensor-parallel-size 8 \ --max-model-len 8192 \ --trust-remote-code \ --disable-custom-all-reduce \ --enable-prefix-caching \ --tokenizer-mode slow \ --enforce-eager \ --gpu-memory-utilization 0.9...
> If you can biset to find out the commit that leads to the degradation, that would be helpful. Otherwise, it is very difficult to answer a generic report of...
I have test on L20, not sure the device is same as in CI.
with cuda graph: 35.7ms -> 36.7ms, without cuda graph: 39ms -> 45ms
@RunningLeon 请问下为什么internlm2需要单独一个convert_checkpoint.py呢?而不是复用llama的convert_checkpoint.py,internlm是直接使用llama的convert_checkpoint.py
> > @RunningLeon 请问下为什么internlm2需要单独一个convert_checkpoint.py呢?而不是复用llama的convert_checkpoint.py,internlm是直接使用llama的convert_checkpoint.py > > hi, internlm2 W_qkv是在一起的,其次一些参数命名是和llama没有对齐的。因而没法直接使用llama的convert_checkpoint.py Thank you for this explanation!
可以关注下我们的 [PaddlePaddle/FastDeploy](https://github.com/PaddlePaddle/FastDeploy) repo,更优的部署体验请尝试FastDeploy😎
> 贴一下模型推理结果的log和可视化效果,不是指编译后的库
> 这个ghostnet的结果比较异常
先把SSD的代码兼容到最新的develop代码吧,然后再重新提个PR