Sekri0 issues

Results 5 issues of


                                            Sekri0

[Bug]: vLLM 0.5.5 using prefix caching causing CUDA error: illegal memory access

### Your current environment The output of `python collect_env.py` ```text Your output of `python collect_env.py` here ``` ### 🐛 Describe the bug --enable-prefix-caching causing CUDA error: illegal memory access. According...

bug

Seqlen of Kernel Benchmark

I noticed that the dimension of seqlen (variable M in engine/test.sh) in kernel Benchmark is very small. Does this mean that the test only considers the decode stage and ignores...

No reduction in model size

I use this command to quantize llama2-7b-chat model, but the model size dosen't change. CUDA_VISIBLE_DEVICES=0 python3 main.py \ --model /mnt/home/model/llama2-7b-chat-hf \ --epochs 20 --output_dir ./log/llama2-7b-w2a8 \ --eval_ppl --wbits 2 --abits...

CUDA kernel of weight only quantization

As mentioned in README, [Note that due to the limitations of AutoGPTQ kernels, the real quantization of weight-only quantization can only lead memory reduction, but with slower inference speed.] I'm...

demo_onnx.py运行报错

Notice: In order to resolve issues more efficiently, please raise issue following the template. （注意：为了更加高效率解决您遇到的问题，请按照模板提问，补充细节） ## 🐛 Bug 运行demo_onnx.py报错 demo_onnx.py: ![image](https://github.com/user-attachments/assets/1041952b-d44c-409d-91f0-20fa56ac235e) 报错信息： Traceback (most recent call last): File "/workspace/data/wcy/code/SenseVoice/demo_onnx.py", line...

bug