llama.cpp icon indicating copy to clipboard operation
llama.cpp copied to clipboard

Add support for properly optimized Windows ARM64 builds with LLVM and MSVC

Open max-krasnyansky opened this issue 1 year ago • 12 comments

Currently Windows ARM64 builds are not properly optimized, which results in low token rates on Windows ARM64 platforms such as the upcoming Snapgradon X-Elite & Plus.

This update adds / resolves the following things:

  • Fixes MSVC & Clang warnings & errors in the logging code
  • Adds proper MatMul-INT8 support detection when building with MSVC for ARM64
  • Fixes errors in MatMul-INT8 when compiled with MSVC, which also fixes warnings with Clang, and improves MatMul-INT8 NEON intrinsics usage in general
  • Adds CMake Toolchain files for Windows ARM64 MSVC and LLVM builds We're using LLVM 16.x included in MS Visual Studio 2022
  • Updates GitHub Actions build workflow to produce optimized Windows ARM64 builds All Windows cmake build targets now explicitly say x64 or arm64

Here are some before/after token rates from a Snapdragon X-Elite-based laptop.

llama-v2-7B, q4_0, CPU backend, 6 threads

Prebuilt Release (master)   | prompt-eval: 34-35 t/s | eval:   4-6 t/s
This PR (MSVC)              | prompt-eval: 60-62 t/s | eval: 10-11 t/s
This PR (LLVM/Clang)        | prompt-eval: 70-72 t/s | eval: 20-21 t/s

Here is how to build with LLVM/Clang using CMake Presets:

# from MS V.Stdio dev shell
src\llama.cpp> cmake --preset arm64-windows-llvm-release
...
src\llama.cpp> cmake --build build-arm64-windows-llvm-release
...
src\llama.cpp> cmake --install build-arm64-windows-llvm-release --prefix pkg-arm64-windows-llvm

Here is how to build with MSVC

# from MS V.Stdio dev shell
src\llama.cpp> cmake --preset arm64-windows-msvc-release
...
src\llama.cpp> cmake --build build-arm64-windows-msvc-release
...
src\llama.cpp>cmake --install build-arm64-windows-msvc-release --prefix pkg-arm64-windows-msvc

This all works with MS Visual Studio 2022 Community Edition. One just needs to enable all native ARM64 related features, and install LLVM/Clang add-on. Hosted Github CI Runners already include all that.

max-krasnyansky avatar May 10 '24 02:05 max-krasnyansky

📈 llama.cpp server for bench-server-baseline on Standard_NC4as_T4_v3 for phi-2-q4_0: 541 iterations 🚀

Expand details for performance related PR only
  • Concurrent users: 8, duration: 10m
  • HTTP request : avg=8614.13ms p(95)=20803.78ms fails=, finish reason: stop=489 truncated=52
  • Prompt processing (pp): avg=96.96tk/s p(95)=402.6tk/s
  • Token generation (tg): avg=71.42tk/s p(95)=47.97tk/s
  • ggml-org/models/phi-2/ggml-model-q4_0.gguf parallel=8 ctx-size=16384 ngl=33 batch-size=2048 ubatch-size=256 pp=1024 pp+tg=2048 branch=win-arm64-build commit=ece01fc2e99570f240ecc9a65f3e4f3df216e827

prompt_tokens_seconds

More
---
config:
    xyChart:
        titleFontSize: 12
        width: 900
        height: 600
    themeVariables:
        xyChart:
            titleColor: "#000000"
---
xychart-beta
    title "llama.cpp bench-server-baseline on Standard_NC4as_T4_v3
 duration=10m 541 iterations"
    y-axis "llamacpp:prompt_tokens_seconds"
    x-axis "llamacpp:prompt_tokens_seconds" 1715791145 --> 1715791771
    line [0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 730.36, 730.36, 730.36, 730.36, 730.36, 833.8, 833.8, 833.8, 833.8, 833.8, 836.46, 836.46, 836.46, 836.46, 836.46, 836.11, 836.11, 836.11, 836.11, 836.11, 851.13, 851.13, 851.13, 851.13, 851.13, 847.98, 847.98, 847.98, 847.98, 847.98, 867.38, 867.38, 867.38, 867.38, 867.38, 872.2, 872.2, 872.2, 872.2, 872.2, 865.8, 865.8, 865.8, 865.8, 865.8, 878.85, 878.85, 878.85, 878.85, 878.85, 882.61, 882.61, 882.61, 882.61, 882.61, 868.55, 868.55, 868.55, 868.55, 868.55, 878.7, 878.7, 878.7, 878.7, 878.7, 863.95, 863.95, 863.95, 863.95, 863.95, 817.54, 817.54, 817.54, 817.54, 817.54, 822.79, 822.79, 822.79, 822.79, 822.79, 821.63, 821.63, 821.63, 821.63, 821.63, 829.02, 829.02, 829.02, 829.02, 829.02, 838.95, 838.95, 838.95, 838.95, 838.95, 837.14, 837.14, 837.14, 837.14, 837.14, 837.5, 837.5, 837.5, 837.5, 837.5, 841.08, 841.08, 841.08, 841.08, 841.08, 843.53, 843.53, 843.53, 843.53, 843.53, 839.73, 839.73, 839.73, 839.73, 839.73, 837.97, 837.97, 837.97, 837.97, 837.97, 840.42, 840.42, 840.42, 840.42, 840.42, 856.32, 856.32, 856.32, 856.32, 856.32, 855.65, 855.65, 855.65, 855.65, 855.65, 855.94, 855.94, 855.94, 855.94, 855.94, 857.43, 857.43, 857.43, 857.43, 857.43, 860.59, 860.59, 860.59, 860.59, 860.59, 857.14, 857.14, 857.14, 857.14, 857.14, 859.04, 859.04, 859.04, 859.04, 859.04, 870.52, 870.52, 870.52, 870.52, 870.52, 872.6, 872.6, 872.6, 872.6, 872.6, 873.58, 873.58, 873.58, 873.58, 873.58, 869.71, 869.71, 869.71, 869.71, 869.71, 866.43, 866.43, 866.43, 866.43, 866.43, 865.63, 865.63, 865.63, 865.63, 865.63, 868.0, 868.0, 868.0, 868.0, 868.0, 867.91, 867.91, 867.91, 867.91, 867.91, 874.67, 874.67, 874.67, 874.67, 874.67, 870.39, 870.39, 870.39, 870.39, 870.39, 870.82, 870.82, 870.82, 870.82, 870.82, 869.14, 869.14, 869.14, 869.14, 869.14, 866.61, 866.61, 866.61, 866.61, 866.61, 861.23, 861.23, 861.23, 861.23, 861.23, 863.26, 863.26, 863.26, 863.26, 863.26, 865.62, 865.62, 865.62, 865.62, 865.62, 865.15, 865.15, 865.15, 865.15, 865.15, 864.2, 864.2, 864.2, 864.2, 864.2, 865.5, 865.5, 865.5, 865.5, 865.5, 869.2, 869.2, 869.2, 869.2, 869.2, 872.0, 872.0, 872.0, 872.0, 872.0, 867.09, 867.09, 867.09, 867.09, 867.09, 868.79, 868.79, 868.79, 868.79, 868.79, 868.2, 868.2, 868.2, 868.2, 868.2, 868.92, 868.92, 868.92, 868.92, 868.92, 869.58, 869.58, 869.58, 869.58, 869.58, 870.46, 870.46, 870.46, 870.46]
                    
predicted_tokens_seconds
More
---
config:
    xyChart:
        titleFontSize: 12
        width: 900
        height: 600
    themeVariables:
        xyChart:
            titleColor: "#000000"
---
xychart-beta
    title "llama.cpp bench-server-baseline on Standard_NC4as_T4_v3
 duration=10m 541 iterations"
    y-axis "llamacpp:predicted_tokens_seconds"
    x-axis "llamacpp:predicted_tokens_seconds" 1715791145 --> 1715791771
    line [0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 45.25, 45.25, 45.25, 45.25, 45.25, 27.07, 27.07, 27.07, 27.07, 27.07, 30.64, 30.64, 30.64, 30.64, 30.64, 32.14, 32.14, 32.14, 32.14, 32.14, 33.03, 33.03, 33.03, 33.03, 33.03, 34.3, 34.3, 34.3, 34.3, 34.3, 35.15, 35.15, 35.15, 35.15, 35.15, 35.18, 35.18, 35.18, 35.18, 35.18, 35.06, 35.06, 35.06, 35.06, 35.06, 34.09, 34.09, 34.09, 34.09, 34.09, 34.08, 34.08, 34.08, 34.08, 34.08, 33.83, 33.83, 33.83, 33.83, 33.83, 32.59, 32.59, 32.59, 32.59, 32.59, 32.58, 32.58, 32.58, 32.58, 32.58, 31.29, 31.29, 31.29, 31.29, 31.29, 30.84, 30.84, 30.84, 30.84, 30.84, 30.54, 30.54, 30.54, 30.54, 30.54, 30.57, 30.57, 30.57, 30.57, 30.57, 30.36, 30.36, 30.36, 30.36, 30.36, 30.29, 30.29, 30.29, 30.29, 30.29, 30.21, 30.21, 30.21, 30.21, 30.21, 30.17, 30.17, 30.17, 30.17, 30.17, 30.18, 30.18, 30.18, 30.18, 30.18, 30.28, 30.28, 30.28, 30.28, 30.28, 30.2, 30.2, 30.2, 30.2, 30.2, 30.47, 30.47, 30.47, 30.47, 30.47, 30.52, 30.52, 30.52, 30.52, 30.52, 30.58, 30.58, 30.58, 30.58, 30.58, 30.75, 30.75, 30.75, 30.75, 30.75, 30.95, 30.95, 30.95, 30.95, 30.95, 31.02, 31.02, 31.02, 31.02, 31.02, 31.1, 31.1, 31.1, 31.1, 31.1, 31.18, 31.18, 31.18, 31.18, 31.18, 31.26, 31.26, 31.26, 31.26, 31.26, 31.17, 31.17, 31.17, 31.17, 31.17, 31.03, 31.03, 31.03, 31.03, 31.03, 30.07, 30.07, 30.07, 30.07, 30.07, 30.12, 30.12, 30.12, 30.12, 30.12, 30.26, 30.26, 30.26, 30.26, 30.26, 30.31, 30.31, 30.31, 30.31, 30.31, 30.48, 30.48, 30.48, 30.48, 30.48, 30.6, 30.6, 30.6, 30.6, 30.6, 30.59, 30.59, 30.59, 30.59, 30.59, 30.37, 30.37, 30.37, 30.37, 30.37, 30.21, 30.21, 30.21, 30.21, 30.21, 29.11, 29.11, 29.11, 29.11, 29.11, 28.97, 28.97, 28.97, 28.97, 28.97, 29.03, 29.03, 29.03, 29.03, 29.03, 29.24, 29.24, 29.24, 29.24, 29.24, 29.24, 29.24, 29.24, 29.24, 29.24, 29.22, 29.22, 29.22, 29.22, 29.22, 29.23, 29.23, 29.23, 29.23, 29.23, 29.12, 29.12, 29.12, 29.12, 29.12, 29.21, 29.21, 29.21, 29.21, 29.21, 29.2, 29.2, 29.2, 29.2, 29.2, 29.25, 29.25, 29.25, 29.25, 29.25, 29.39, 29.39, 29.39, 29.39, 29.39, 29.52, 29.52, 29.52, 29.52, 29.52, 29.56, 29.56, 29.56, 29.56, 29.56, 29.64, 29.64, 29.64, 29.64]
                    

Details

kv_cache_usage_ratio

More
---
config:
    xyChart:
        titleFontSize: 12
        width: 900
        height: 600
    themeVariables:
        xyChart:
            titleColor: "#000000"
---
xychart-beta
    title "llama.cpp bench-server-baseline on Standard_NC4as_T4_v3
 duration=10m 541 iterations"
    y-axis "llamacpp:kv_cache_usage_ratio"
    x-axis "llamacpp:kv_cache_usage_ratio" 1715791145 --> 1715791771
    line [0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.29, 0.29, 0.29, 0.29, 0.29, 0.25, 0.25, 0.25, 0.25, 0.25, 0.14, 0.14, 0.14, 0.14, 0.14, 0.18, 0.18, 0.18, 0.18, 0.18, 0.16, 0.16, 0.16, 0.16, 0.16, 0.1, 0.1, 0.1, 0.1, 0.1, 0.13, 0.13, 0.13, 0.13, 0.13, 0.22, 0.22, 0.22, 0.22, 0.22, 0.17, 0.17, 0.17, 0.17, 0.17, 0.22, 0.22, 0.22, 0.22, 0.22, 0.2, 0.2, 0.2, 0.2, 0.2, 0.35, 0.35, 0.35, 0.35, 0.35, 0.22, 0.22, 0.22, 0.22, 0.22, 0.42, 0.42, 0.42, 0.42, 0.42, 0.25, 0.25, 0.25, 0.25, 0.25, 0.25, 0.25, 0.25, 0.25, 0.25, 0.21, 0.21, 0.21, 0.21, 0.21, 0.19, 0.19, 0.19, 0.19, 0.19, 0.22, 0.22, 0.22, 0.22, 0.22, 0.24, 0.24, 0.24, 0.24, 0.24, 0.2, 0.2, 0.2, 0.2, 0.2, 0.17, 0.17, 0.17, 0.17, 0.17, 0.19, 0.19, 0.19, 0.19, 0.19, 0.29, 0.29, 0.29, 0.29, 0.29, 0.12, 0.12, 0.12, 0.12, 0.12, 0.1, 0.1, 0.1, 0.1, 0.1, 0.31, 0.31, 0.31, 0.31, 0.31, 0.12, 0.12, 0.12, 0.12, 0.12, 0.09, 0.09, 0.09, 0.09, 0.09, 0.15, 0.15, 0.15, 0.15, 0.15, 0.18, 0.18, 0.18, 0.18, 0.18, 0.1, 0.1, 0.1, 0.1, 0.1, 0.2, 0.2, 0.2, 0.2, 0.2, 0.21, 0.21, 0.21, 0.21, 0.21, 0.31, 0.31, 0.31, 0.31, 0.31, 0.51, 0.51, 0.51, 0.51, 0.51, 0.34, 0.34, 0.34, 0.34, 0.34, 0.18, 0.18, 0.18, 0.18, 0.18, 0.15, 0.15, 0.15, 0.15, 0.15, 0.17, 0.17, 0.17, 0.17, 0.17, 0.16, 0.16, 0.16, 0.16, 0.16, 0.28, 0.28, 0.28, 0.28, 0.28, 0.45, 0.45, 0.45, 0.45, 0.45, 0.57, 0.57, 0.57, 0.57, 0.57, 0.48, 0.48, 0.48, 0.48, 0.48, 0.43, 0.43, 0.43, 0.43, 0.43, 0.18, 0.18, 0.18, 0.18, 0.18, 0.12, 0.12, 0.12, 0.12, 0.12, 0.19, 0.19, 0.19, 0.19, 0.19, 0.19, 0.19, 0.19, 0.19, 0.19, 0.21, 0.21, 0.21, 0.21, 0.21, 0.18, 0.18, 0.18, 0.18, 0.18, 0.19, 0.19, 0.19, 0.19, 0.19, 0.2, 0.2, 0.2, 0.2, 0.2, 0.08, 0.08, 0.08, 0.08, 0.08, 0.11, 0.11, 0.11, 0.11, 0.11, 0.12, 0.12, 0.12, 0.12, 0.12, 0.15, 0.15, 0.15, 0.15, 0.15, 0.22, 0.22, 0.22, 0.22, 0.22, 0.13, 0.13, 0.13, 0.13]
                    
requests_processing
More
---
config:
    xyChart:
        titleFontSize: 12
        width: 900
        height: 600
    themeVariables:
        xyChart:
            titleColor: "#000000"
---
xychart-beta
    title "llama.cpp bench-server-baseline on Standard_NC4as_T4_v3
 duration=10m 541 iterations"
    y-axis "llamacpp:requests_processing"
    x-axis "llamacpp:requests_processing" 1715791145 --> 1715791771
    line [0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 7.0, 7.0, 7.0, 7.0, 7.0, 3.0, 3.0, 3.0, 3.0, 3.0, 3.0, 3.0, 3.0, 3.0, 3.0, 4.0, 4.0, 4.0, 4.0, 4.0, 3.0, 3.0, 3.0, 3.0, 3.0, 7.0, 7.0, 7.0, 7.0, 7.0, 5.0, 5.0, 5.0, 5.0, 5.0, 6.0, 6.0, 6.0, 6.0, 6.0, 7.0, 7.0, 7.0, 7.0, 7.0, 2.0, 2.0, 2.0, 2.0, 2.0, 7.0, 7.0, 7.0, 7.0, 7.0, 6.0, 6.0, 6.0, 6.0, 6.0, 4.0, 4.0, 4.0, 4.0, 4.0, 5.0, 5.0, 5.0, 5.0, 5.0, 5.0, 5.0, 5.0, 5.0, 5.0, 6.0, 6.0, 6.0, 6.0, 6.0, 3.0, 3.0, 3.0, 3.0, 3.0, 8.0, 8.0, 8.0, 8.0, 8.0, 2.0, 2.0, 2.0, 2.0, 2.0, 6.0, 6.0, 6.0, 6.0, 6.0, 4.0, 4.0, 4.0, 4.0, 4.0, 6.0, 6.0, 6.0, 6.0, 6.0, 4.0, 4.0, 4.0, 4.0, 4.0, 5.0, 5.0, 5.0, 5.0, 5.0, 5.0, 5.0, 5.0, 5.0, 5.0, 5.0, 5.0, 5.0, 5.0, 5.0, 4.0, 4.0, 4.0, 4.0, 4.0, 3.0, 3.0, 3.0, 3.0, 3.0, 7.0, 7.0, 7.0, 7.0, 7.0, 3.0, 3.0, 3.0, 3.0, 3.0, 6.0, 6.0, 6.0, 6.0, 6.0, 7.0, 7.0, 7.0, 7.0, 7.0, 3.0, 3.0, 3.0, 3.0, 3.0, 6.0, 6.0, 6.0, 6.0, 6.0, 7.0, 7.0, 7.0, 7.0, 7.0, 7.0, 7.0, 7.0, 7.0, 7.0, 1.0, 1.0, 1.0, 1.0, 1.0, 3.0, 3.0, 3.0, 3.0, 3.0, 2.0, 2.0, 2.0, 2.0, 2.0, 7.0, 7.0, 7.0, 7.0, 7.0, 4.0, 4.0, 4.0, 4.0, 4.0, 5.0, 5.0, 5.0, 5.0, 5.0, 8.0, 8.0, 8.0, 8.0, 8.0, 7.0, 7.0, 7.0, 7.0, 7.0, 8.0, 8.0, 8.0, 8.0, 8.0, 3.0, 3.0, 3.0, 3.0, 3.0, 5.0, 5.0, 5.0, 5.0, 5.0, 4.0, 4.0, 4.0, 4.0, 4.0, 1.0, 1.0, 1.0, 1.0, 1.0, 8.0, 8.0, 8.0, 8.0, 8.0, 4.0, 4.0, 4.0, 4.0, 4.0, 8.0, 8.0, 8.0, 8.0, 8.0, 5.0, 5.0, 5.0, 5.0, 5.0, 3.0, 3.0, 3.0, 3.0, 3.0, 5.0, 5.0, 5.0, 5.0, 5.0, 7.0, 7.0, 7.0, 7.0, 7.0, 6.0, 6.0, 6.0, 6.0, 6.0, 2.0, 2.0, 2.0, 2.0, 2.0, 7.0, 7.0, 7.0, 7.0, 7.0, 3.0, 3.0, 3.0, 3.0]
                    

github-actions[bot] avatar May 10 '24 23:05 github-actions[bot]

OMG. Thank you for this. Do you think that the 8cx Gen 3 will benefit from these changes? Also. Would support QNN for Windows be too complicated?

hmartinez82 avatar May 11 '24 10:05 hmartinez82

@ggerganov Thanks for fixing up q8_0_q8_0 (good eyes, it was a cut&paste error that I missed and CI didn't catch). Should be good to merge now. I have more updates coming for the readme, and further ARM64 optimizations but waiting to merge this basic build/fixes stuff first. Rebased / retested on top of the latest master.

max-krasnyansky avatar May 13 '24 04:05 max-krasnyansky

@max-krasnyansky Understood. Just for reference I have an 8cx Gen 3. I was able to get matmul working by using -march=armv8.3-a+dotprod+i8mm. I did notice a jump in the prompt eval speed.

hmartinez82 avatar May 13 '24 22:05 hmartinez82

@hmartinez82

8cx Gen 3

Interesting. I didn't know int8 matmul works on 8cx gen3. That's great! Can you please try running armv8.7-a compiled binaries as is? It might just work since we're technically not using other extensions (at least not explicitly). If that doesn't work (ie you get a segfault due to unsupported instructions) please try -march=armv8.4-a+dotprod+i8mm If that works we could use that instead of armv8.7-a as the common set.

max-krasnyansky avatar May 13 '24 22:05 max-krasnyansky

Well. I built it with -march=armv8.7-a and it worked with llama3, but not llama2 😑 When loading llama2, it crashes with:

llama_new_context_with_model: graph nodes  = 1030
llama_new_context_with_model: graph splits = 1
Illegal instruction

I'm trying -march=armv8.4-a+dotprod+i8mm now

hmartinez82 avatar May 13 '24 22:05 hmartinez82

@max-krasnyansky Ok bad news. I should have sticked with llama2 while testing.

I don't understand why (this is completely out of my league), but llama3 works even when compiling with armv8.7-a. It crashes when using llama2 as the model, even with -march=armv8.3-a+dotprod+i8mm . If I remove +i8mm then llama2 works .

In other words my lack of domain here led me to speak too soon. I couldn't imagine that different models would lead to different CPUs instructions being used 😓

hmartinez82 avatar May 13 '24 23:05 hmartinez82

@hmartinez82 If you use the same quantization (q4_0) for both llama 2 and 3 then they would both use matmul-int8 (if enabled). It's probably crashing in some other code path / other instruction used by the compiler. Did you try -march=armv8.2-a+dotprod+i8mm ? That'd be also good enough to get full rates on X-Elite (with llama 2 and 3 in q4_0).

max-krasnyansky avatar May 13 '24 23:05 max-krasnyansky

Here's my llama2

llm_load_print_meta: model type       = 7B
llm_load_print_meta: model ftype      = Q4_0
llm_load_print_meta: model params     = 6.74 B
llm_load_print_meta: model size       = 3.56 GiB (4.54 BPW)
llm_load_print_meta: general.name     = LLaMA v2

and here's my llama3

llm_load_print_meta: model type       = 8B
llm_load_print_meta: model ftype      = Q4_K - Medium
llm_load_print_meta: model params     = 8.03 B
llm_load_print_meta: model size       = 4.58 GiB (4.89 BPW)
llm_load_print_meta: general.name     = Meta-Llama-3-8B-Instruct-imatrix

I'm going to download Q4_0 of llama3.

But anyway. -march=armv8.2-a+dotprod+i8mm still crashes. I guess this confirms. 8cx Gen 3 does not support +i8mm

hmartinez82 avatar May 14 '24 00:05 hmartinez82

@ggerganov Any objections to merging this? Please let me know if you have any questions/suggestions.

max-krasnyansky avatar May 15 '24 03:05 max-krasnyansky

Could you add some documentation about how to use the CMakePresets.json file? A comment in the PR description is enough. If I understand correctly, this is not being used in any of the CI builds, but rather is meant to provide a set of presets for people building with MSVC. Is that correct?

slaren avatar May 15 '24 16:05 slaren

Could you add some documentation about how to use the CMakePresets.json file? A comment in the PR description is enough. If I understand correctly, this is not being used in any of the CI builds, but rather is meant to provide a set of presets for people building with MSVC. Is that correct?

Ah. I'm going to add a full section in readme how to build native Windows ARM64. And yes, you are correct. I was going to use presets in the CI as well but figured to start with it's more consistent to just explicitly specify CMAKE_TOOLCHAIN and things. If you guys like the CMakePresets I have them for ubuntu-x64, macos, etc.

Here is how to build with LLVM/Clang using CMake Presets:

# from MS V.Stdio dev shell
src\llama.cpp> cmake --preset arm64-windows-llvm-release
...
src\llama.cpp> cmake --build build-arm64-windows-llvm-release
...
src\llama.cpp> cmake --install build-arm64-windows-llvm-release --prefix pkg-arm64-windows-llvm

Here is how to build with MSVC

# from MS V.Stdio dev shell
src\llama.cpp> cmake --preset arm64-windows-msvc-release
...
src\llama.cpp> cmake --build build-arm64-windows-msvc-release
...
src\llama.cpp>cmake --install build-arm64-windows-msvc-release --prefix pkg-arm64-windows-msvc

This all works with MS Visual Studio 2022 Community Edition. One just needs to enable all native ARM64 related features, and install LLVM/Clang add-on. Hosted Github CI Runners already include all that.

max-krasnyansky avatar May 15 '24 17:05 max-krasnyansky

@max-krasnyansky Now who's going to be the good samaritan and add support for the 8cx NPU😅. It has MATMUL support I think .

hmartinez82 avatar May 15 '24 19:05 hmartinez82

@slaren Please don't forget to hit that merge button :) Would be good to avoid further rebases while all checks are passing. I wanted to retest released binaries and will then submit README and further updates.

max-krasnyansky avatar May 16 '24 02:05 max-krasnyansky

The CMakePresets.json file has been giving me issues. Visual Studio Code is available on all OSs and this is setup specifically for Windows. I'm now greeted with a prompt for it every time and Visual Studio Code attempts to overwrite it which creates conflicts. System specific configurations should be separated or inclusive.

teleprint-me avatar May 28 '24 00:05 teleprint-me

Yes, same here. It forces you to select one of the presets, right?

hmartinez82 avatar May 28 '24 01:05 hmartinez82

Yes, it does. Every time. Once I name it, it overwrites the file. The branch is affected afterwards as a result.

teleprint-me avatar May 28 '24 01:05 teleprint-me

The CMakePresets.json file has been giving me issues. Visual Studio Code is available on all OSs and this is setup specifically for Windows. I'm now greeted with a prompt for it every time and Visual Studio Code attempts to overwrite it which creates conflicts. System specific configurations should be separated or inclusive.

Odd. I don't use Visual Studio Code but it seems to me like a settings issue. CMake Presets is the standard CMake feature which has nothing to do with the IDEs / UIs.
https://cmake.org/cmake/help/latest/manual/cmake-presets.7.html

max-krasnyansky avatar May 28 '24 04:05 max-krasnyansky

@max-krasnyansky These are usually auto-generated, but can be hand-crafted.

teleprint-me avatar May 28 '24 19:05 teleprint-me

@max-krasnyansky These are usually auto-generated, but can be hand-crafted.

Please see the CMake documentation link I included above.

And yes, the things you listed are windows specific, that's the whole point, we added native windows arm64 build ;-) I will submit submit additional ubuntu, android and macos presets later.

max-krasnyansky avatar May 28 '24 19:05 max-krasnyansky

I did read it. It doesn't change the fact that these settings are system specific. This file should be ignored.

teleprint-me avatar May 28 '24 21:05 teleprint-me

I am not sure that we need to make changes to accommodate what seems to be a buggy or misconfigured VS Code extension. FWITW I use VS Code, but not the cmake extension, because I always found it more annoying than useful.

slaren avatar May 28 '24 21:05 slaren

@slaren I have to concur with you. The CMake extension should not force us to use the presets just because they happen to be in the file system.

hmartinez82 avatar May 28 '24 21:05 hmartinez82

@slaren These are system specific settings. They are settings geared towards ARM builds on Microsoft Windows. While the settings can be inclusive, it doesn't change the current state of the file. I respect your opinion and input. I have nothing left to say or add to this discussion. I stand by what I've said.

teleprint-me avatar May 28 '24 22:05 teleprint-me