unbounded

Results 23 comments of unbounded

Did some performance testing on M1 and updated the q4_2c branch: Now I see a performance win here as well: q4_2: ``` llama_print_timings: prompt eval time = 592.93 ms /...

Some miscellaneous performance observations: I saw no performance difference using 64-byte aligned loads with AVX-512. Prefetching instructions give no benefit at all on M1 processors. On other CPUs they helped...

> My guess is that a locally adaptive variable bit rate would require a major change to ggml. The easiest way I can think of would be to vary allocation...