BitNet
BitNet copied to clipboard
Title: TL1/TL2 codegen fails for any configuration with bm=16 on Windows 11
When generating TL1/TL2 kernels with bm=16, all configurations fail either during (1) codegen_tl1.py / codegen_tl2.py execution, or (2) CMake build (llama-bench build failure).
This happens consistently for all BM/BK settings. Other block sizes (e.g., bm=32, bm=64, bm=128) work normally.