Sekri0

Results 5 issues of Sekri0

### Your current environment The output of `python collect_env.py` ```text Your output of `python collect_env.py` here ``` ### 🐛 Describe the bug --enable-prefix-caching causing CUDA error: illegal memory access. According...

bug

I noticed that the dimension of seqlen (variable M in engine/test.sh) in kernel Benchmark is very small. Does this mean that the test only considers the decode stage and ignores...

I use this command to quantize llama2-7b-chat model, but the model size dosen't change. CUDA_VISIBLE_DEVICES=0 python3 main.py \ --model /mnt/home/model/llama2-7b-chat-hf \ --epochs 20 --output_dir ./log/llama2-7b-w2a8 \ --eval_ppl --wbits 2 --abits...

As mentioned in README, [Note that due to the limitations of AutoGPTQ kernels, the real quantization of weight-only quantization can only lead memory reduction, but with slower inference speed.] I'm...

Notice: In order to resolve issues more efficiently, please raise issue following the template. (注意:为了更加高效率解决您遇到的问题,请按照模板提问,补充细节) ## 🐛 Bug 运行demo_onnx.py报错 demo_onnx.py: ![image](https://github.com/user-attachments/assets/1041952b-d44c-409d-91f0-20fa56ac235e) 报错信息: Traceback (most recent call last): File "/workspace/data/wcy/code/SenseVoice/demo_onnx.py", line...

bug