Request: ARM SME support (for Apple M4)..
No need for unofficial Apple AMX intruction set on M4.. 2tflops possible..
PRs welcome... do you have the hardware to test ?
Not yet.. waiting for a mac mini m4..
I think this can be closed, zero chance to run your cpuid on ipad, and normal computers release year later. AMX is NOT ISA , it is a co-processor with prefixed instructions emitted from main cpu. Like FPU on 80386 or crypto accelerators nowadays. There is no public documentation outsude accelerate cblas using it.
try reading that again, it's about SME...
SME
Which is rumoured on some sites to be present....
You can develop and test using the Fixed Virtual Platform (FVP): https://github.com/apache/tvm/pull/16755 https://github.com/apache/tvm/pull/16749
Gcc11+ can compile it, the question is whether itvis supported on particular cpu.
some implementation hints there: https://scalable.uni-jena.de/opt/sme/index.html
Hi, any news? have a Mac Mini m4 to test..
Bought an M4 mini myself recently but have not gotten around to doing much with it yet.
#5084 added SME for the "small matrix" SGEMM pathway but needs some small tweaks to connect the M4 cpu target to it
#5011 has a more general SME GEMM kernel but needs fixes for proper SYMM/TRMM support before it can be merged
#5084 added SME for the "small matrix" SGEMM pathway but needs some small tweaks to connect the M4 cpu target to it
#5011 has a more general SME GEMM kernel but needs fixes for proper SYMM/TRMM support before it can be merged
Based on a SC24 workshop Hello SME, https://github.com/llvm/llvm-project/issues/114987 and https://github.com/llvm/llvm-project/pull/95478 . Apple M4 does not support SVE outside of streaming.
However, concurrent [WIP] https://github.com/OpenMathLib/OpenBLAS/pull/5011 is on top of KERNEL.ARMV8SVE. Result in illegal instruction. Any good ideas to solve that? Is create a new KERNEL.M4SME2 based on KERNEL.ARMV8 a good idea?
Further more, I have made some test on differences between SME1 and SME2 recently. It's quiet different to achieve best performance. I don‘t known if ACLE could fully utilize these resources.
Yes, M4 only does streaming SVE so you'd need at least some setup code to enter streaming mode and perhaps save some dual-use registers beforehand, or even work in a totally different set of registers than what the existing SVE code uses.
Both #5011 and #5084 introduced an ARMV9SME target for differentiation, it would also be possible to select kernel implementations (either at the KERNEL file level or within individual implementations) based on HAVE_SME or a similiar define. As #5011 is a WIP only concerned with GEMM and related functions, it does not work outside its narrow scope.
The way forward - at least short-term - should be to split out M4 from the general "VORTEX" target into its own designation and enable the SME-based "small gemm" pathway for it. I hope to complete this very soon.
Hi, I would like to ask why I encountered the following error on M4pro:
Is it possible that my compiler does not recognize streaming flags?
Compilation: clang -g -O0 -march=armv9.2-a+sme+sme2 ./test_sme_acle.cc -o ./test_sme_acle
Clang version: Homebrew clang version 20.1.2 Target: arm64-apple-darwin24.3.0 Thread model: posix InstalledDir: /opt/homebrew/Cellar/llvm/20.1.2/bin
Hi, I would like to ask why I encountered the following error on M4pro:
Is it possible that my compiler does not recognize streaming flags?
Compilation: clang -g -O0 -march=armv9.2-a+sme+sme2 ./test_sme_acle.cc -o ./test_sme_acle
Clang version: Homebrew clang version 20.1.2 Target: arm64-apple-darwin24.3.0 Thread model: posix InstalledDir: /opt/homebrew/Cellar/llvm/20.1.2/bin
looks like the same problem mentioned above. Try using disassemble --mixed to show illegal instruction.
looks like the same problem mentioned above. Try using
disassemble --mixedto show illegal instruction.
Thank you for your kindly reply, I disassembled it in lldb and the illegel instruction turns out to be cntd!
That means the streaming flags will make the compiler add some illegal sve instructions that are not in streaming mode.
This is somehow wired. Because I can't manually set the streaming mode before main is called.
So I tried to remove the streaming flags in main and moved the sve code into another function foo
with the local streaming flags.
I also manually placed the invocation statement of foo within smstart and smstop.
After these, the code could finally run normally!
looks like the same problem mentioned above. Try using
disassemble --mixedto show illegal instruction.Thank you for your kindly reply, I disassembled it in lldb and the illegel instruction turns out to be
cntd!That means the streaming flags will make the compiler add some illegal sve instructions that are not in streaming mode.
This is somehow wired. Because I can't manually set the streaming mode before main is called.
So I tried to remove the streaming flags in main and moved the sve code into another function
foowith the local streaming flags.I also manually placed the invocation statement of
foowithinsmstartandsmstop.After these, the code could finally run normally!
congrats, I also tried resolve similar issues. I encounter ADVL during unit test.
加个微信?码发你邮箱了捏。
the fast path for small matrix sgemm should be working on M4 with #5222 - but I'm now stuck on an illegal instruction error involving cntd/cntw myself, trying to get dot_kernel_sve working in streaming mode with the __arm_streaming attribute
the fast path for small matrix sgemm should be working on M4 with #5222 - but I'm now stuck on an illegal instruction error involving cntd/cntw myself, trying to get dot_kernel_sve working in streaming mode with the __arm_streaming attribute
Is it a bug of LLVM compiler?