less_slow.cpp
less_slow.cpp copied to clipboard
Data Alignment may have error?
The loop in f32_pairwise_accumulation have f32s_in_cache_line_half_k * 2 times, and the other one only have f32s_in_cache_line_half_k times.