llama.cpp Unit test for quantization functions

Use ggml_internal_get_quantize_fn to loop through all quantization formats and run sanity checks on the implemented functions. They are run by ctest, but also accept a few command line parameters for more output.

This is a quick test with generated data, so the measurements are not very useful to guide perplexity, but they might surface issues like #876

Also add a microbenchmark that times these functions directly without running the rest of the GGML graph.

Some similarity with #653, but I think there is value both in having tests that run the full GGML graph and tests for specific issues with the SIMD implementations.

Example output:

test-quantize-fns -v

 q4_0 absolute quantization error: ok (0.001466)
 q4_0 reference implementation error: ok (0.000000)
 q4_0 dot product error: ok (0.002492)
 q4_1 absolute quantization error: ok (0.001296)
 q4_1 reference implementation error: ok (0.000000)
 q4_1 dot product error: ok (0.012034)
0 tests failed

test-quantize-perf -3 --op vec_dot_q

q4_0
  vec_dot_q
    3200 values (0.01 MB)
      min cycles/32 vals   :      2.95
      avg cycles/32 vals   :      2.97
      float32 throughput   :     59.60 GB/s
      quantized throughput :      9.31 GB/s
    64000 values (0.24 MB)
      min cycles/32 vals   :      2.54
      avg cycles/32 vals   :      3.89
      float32 throughput   :     45.85 GB/s
      quantized throughput :      7.16 GB/s
    640000 values (2.44 MB)
      min cycles/32 vals   :      2.52
      avg cycles/32 vals   :      2.77
      float32 throughput   :     64.26 GB/s
      quantized throughput :     10.04 GB/s

q4_1
  vec_dot_q
    3200 values (0.01 MB)
      min cycles/32 vals   :      5.44
      avg cycles/32 vals   :      5.48
      float32 throughput   :     29.80 GB/s
      quantized throughput :      5.59 GB/s
    64000 values (0.24 MB)
      min cycles/32 vals   :      5.21
      avg cycles/32 vals   :      6.79
      float32 throughput   :     26.20 GB/s
      quantized throughput :      4.91 GB/s
    640000 values (2.44 MB)
      min cycles/32 vals   :      5.05
      avg cycles/32 vals   :      5.06
      float32 throughput   :     35.32 GB/s
      quantized throughput :      6.62 GB/s

Apr 13 '23 22:04 unbounded

Please fix build failures.

If the build failures turn out to be more problematic, we can extract the first commit and submit it as a separate pull request, which can be reviewed and merged pretty quickly.

Then we can rebase this branch/PR and try to figure out why the quantization tests fail.

Apr 14 '23 17:04 prusnak

I went ahead and extracted the first commit (including my suggestion from above https://github.com/ggerganov/llama.cpp/pull/953#discussion_r1167001496) as Pull Request https://github.com/ggerganov/llama.cpp/pull/970

Apr 14 '23 17:04 prusnak

You might remove test-quantize.c, that was my rather lazy attempt at a unit test.

Apr 14 '23 17:04 sw

https://github.com/ggerganov/llama.cpp/pull/970 has been merged

Please rebase the branch on top of current master:

git checkout master
git pull
git checkout quantize-tests
git rebase master
git push --force

or you can rebase interactively with git rebase -i master and drop the first commit

Apr 14 '23 18:04 prusnak

Somehow I've lost track of this PR - sorry

What is left to be done before merge? I see a comment by @sw that does not seem to be addressed yet

Apr 21 '23 18:04 ggerganov