Slower image encode the lower the quantization

Open kfuseini opened this issue 2 years ago • 1 comments

I'm running this model clip-vit-base-patch32_ggml on my intel mac and it looks like the lower the quantization the slower image encoding is. I tried the main clip-vit-base-patch32_ggml-model-f32.gguf model and the q8_0 and q4_0 variants.

These are the encode times I get for a batch of 4 images:

clip-vit-base-patch32_ggml-model-f32.gguf
Avg Batch Img Encode Time: 272.21ms

clip-vit-base-patch32_ggml-model-f16.gguf
Avg Batch Img Encode Time: 665.07ms

clip-vit-base-patch32_ggml-model-q8_0.gguf
Avg Batch Img Encode Time: 333.96ms

clip-vit-base-patch32_ggml-model-q5_1.gguf
Avg Batch Img Encode Time: 322.71ms

clip-vit-base-patch32_ggml-model-q5_0.gguf
Avg Batch Img Encode Time: 354.86ms

clip-vit-base-patch32_ggml-model-q4_1.gguf
Avg Batch Img Encode Time: 330.20ms

clip-vit-base-patch32_ggml-model-q4_0.gguf
Avg Batch Img Encode Time: 539.32ms

f16 looks like an outlier, taking the most time. But looking at f32(272.21ms) -> q8_0(333.96ms) -> q5_0(354.86ms) -> q4_0(539.32ms), time is getting worse. Its better with the _1 variants though.

Anyone know if this expected or is there something wrong?

Nov 13 '23 22:11 kfuseini

Im having the same issue it takes a very long time to encode images. Im getting an average of 830 ms for my q5_0 model on a rather old(2019, i7) mac.

Any information regarding this would be much appreciated :)

Jan 05 '24 14:01 ellonde