DefTruth

Results 256 comments of DefTruth

command: ```bash export CUDA_VISIBLE_DEVICES=0,1,2,3,4,5,6,7 python3 -m vllm.entrypoints.openai.api_server \ --model Qwen1.5-72B-Chat \ --tensor-parallel-size 8 \ --max-model-len 8192 \ --trust-remote-code \ --disable-custom-all-reduce \ --enable-prefix-caching \ --tokenizer-mode slow \ --enforce-eager \ --gpu-memory-utilization 0.9...

> If you can biset to find out the commit that leads to the degradation, that would be helpful. Otherwise, it is very difficult to answer a generic report of...

I have test on L20, not sure the device is same as in CI.

with cuda graph: 35.7ms -> 36.7ms, without cuda graph: 39ms -> 45ms

@RunningLeon 请问下为什么internlm2需要单独一个convert_checkpoint.py呢?而不是复用llama的convert_checkpoint.py,internlm是直接使用llama的convert_checkpoint.py

> > @RunningLeon 请问下为什么internlm2需要单独一个convert_checkpoint.py呢?而不是复用llama的convert_checkpoint.py,internlm是直接使用llama的convert_checkpoint.py > > hi, internlm2 W_qkv是在一起的,其次一些参数命名是和llama没有对齐的。因而没法直接使用llama的convert_checkpoint.py Thank you for this explanation!

可以关注下我们的 [PaddlePaddle/FastDeploy](https://github.com/PaddlePaddle/FastDeploy) repo,更优的部署体验请尝试FastDeploy😎

> 贴一下模型推理结果的log和可视化效果,不是指编译后的库

> 这个ghostnet的结果比较异常

先把SSD的代码兼容到最新的develop代码吧,然后再重新提个PR