which clang source of bf16 GEMM kernel
Which source path called for BF16 GEMM kernel(vbfdotq_f32)?
- system clang path: /usr/lib/llvm-14/lib/clang/14.0.0/include/arm_neon.h
- clang of android ndk path: /android-ndk-r27c/toolchains/llvm/prebuilt/linux-x86_64/lib/clang/18/include/arm_neon.h
@WeiMa01 can you provide more details?
Maybe @swolchok knows?
https://github.com/search?q=repo%3Apytorch%2Fexecutorch%20vbfdotq_f32&type=code
https://github.com/search?q=repo%3Apytorch%2Fexecutorch%20vbfdotq_f32&type=code
yes, it's function f32_dot_bf16 called vbfdotq_f32 , However I want to konw what's vbfdotq_f32 called other ?
- system clang path: /usr/lib/llvm-14/lib/clang/14.0.0/include/arm_neon.h _#ifdef LITTLE_ENDIAN __ai float32x4_t vbfdotq_f32(float32x4_t __p0, bfloat16x8_t __p1, bfloat16x8_t __p2) { float32x4_t __ret; __ret = (float32x4_t) __builtin_neon_vbfdotq_v((int8x16_t)__p0, (int8x16_t)__p1, (int8x16_t)__p2, 41); return _ret; }
- clang of android ndk path: /android-ndk-r27c/toolchains/llvm/prebuilt/linux-x86_64/lib/clang/18/include/arm_neon.h _ifdef LITTLE_ENDIAN __ai attribute((target("bf16"))) float32x4_t vbfdotq_f32(float32x4_t __p0, bfloat16x8_t __p1, bfloat16x8_t __p2) { float32x4_t __ret; __ret = (float32x4_t) __builtin_neon_vbfdotq_f32((int8x16_t)__p0, (int8x16_t)__p1, (int8x16_t)__p2, 41); return _ret; }
The answer to this question depends on your exact compilation setup; you are essentially asking "what is the path to the file that #include <arm_neon.h> includes?"
What is the underlying problem you are trying to solve?