Hao Li

Results 7 comments of Hao Li

> Windows doesn't support `HSA_OVERRIDE_GFX_VERSION` and probably doesn't have its own equivalent. You would need to compile a Tensile library for gfx1103 for rocBLAS 5.7, or use Linux. https://github.com/ROCm/ROCm/discussions/2631#discussioncomment-7745585 mentioned...

I also tested llama-2-7b-chat.Q4_0.gguf , it only reached 0.11 tok/s in 780M Radeon Graphics (gfx1103)...., is there any way to have better performance? C:\code\llama.cpp\build\bin>.\main -m c:\code\llama-2-7b-chat.Q4_0.gguf -p "introduce shanghai" -n...

is there any way or plan to make it have better performance? thanks!

> I'm not sure but GLM may use a customized tokenizer which is not supported yet https://github.com/mlc-ai/mlc-llm/pull/1313 mentioned chatglm3 back, but I tried chatglm3-6b, it show same error

@Ubospica thanks! I just tested latest package mlc-ai-nightly-cu122 0.15.dev404 mlc-llm-nightly-cu122 0.1.dev1382 mlc_llm gen_config ./dist/models/glm-4-9b-chat/ --quantization q4f16_1 --conv-template glm -o dist/glm-4-9b-chat-MLC/ works now But after compilation mlc_llm compile ./dist/glm-4-9b-chat-MLC/mlc-chat-config.json --device cuda...

@MasterJH5574 I have tried chatglm3-6b, it will not show error "TVMError: Check failed: append_length > 0 (0 vs. 0) : Append with length 0 is not allowed.", but the output...

@MasterJH5574 I also tried https://huggingface.co/shenzhi-wang/Llama3-8B-Chinese-Chat seems like below CLI work with common output mlc_llm chat ./dist/Llama3-8B-Chinese-Chat-MLC/ --device "cuda" --model-lib ./dist/libs/Llama3-8B-Chinese-Chat/Llama3-8B-Chinese-Chat-q4f16_1-cuda.so >>> introduce shanghai Shanghai, the "Pearl of the Orient," is...