montagetao
montagetao
i found when add the "-t" in /etc/memcached.conf , then restart the memcached service i see the cpu bottlneck on NUMA node1 when load DB(R:W 0:1), i think its the...
启⽤AVX512及AMX指令集 cmake llama.cpp -B llama.cpp/build_amx \ -DGGML_NATIVE=OFF \ -DGGML_AVX512=ON \ -DGGML_AVX512_BF16=ON \ -DGGML_AVX512_VBMI=ON \ -DGGML_AVX512_VNNI=ON \ -DGGML_AMX_TILE=ON \ -DGGML_AMX_INT8=ON \ -DGGML_AMX_BF16=ON # 编译可执⾏文件 cmake --build llama.cpp/build_amx --config Release -j --clean-first...
> [@montagetao](https://github.com/montagetao) > > Thanks for the example command. Did you find that it improved performance of llama.cpp on your newer Intel Xeon with AMX extensions? I'm benchmarking some newer...
> Just profile the llama.cpp for deepseek R1 with Q4_K_M, found it only use the AVX VNNI, not use the AMX instruction. > >  > >  got it,...
> [@oldmikeyang](https://github.com/oldmikeyang) [@montagetao](https://github.com/montagetao) > > I believe at least some AMX extensions do work, but only for certain tensor quantizations ([int8 quants exist, but not sure how to run on...
thanks for you replying, i have invite my team members , but they can't log in ,its need password ,
thank you , i will try later