mllm
mllm copied to clipboard
[Bug] check_llamafile_sgemm error.
In Matmul.cpp, there are two check_llamafile_sgemm to call the llamafile_sgemm kernel, which is suitable for matmul with one matrix transposed. For v_proj whose two matrices are all transposed, the vec_dot kernel should be called. Yet, the second check_llamafile_sgemm works by comparing the seq to dim (i.e., if (ldc < m) return false;), and the seq is easily larger than dim in models like BERT/MobileBERT. In this case, the second llamafile_sgemm is called.
Currently my workaround is to use a global flag that set before/after the v_proj when defining the model structure. Maybe you can refactor the code and make it independant to seq len :)