Slower image encode the lower the quantization
I'm running this model clip-vit-base-patch32_ggml on my intel mac and it looks like the lower the quantization the slower image encoding is. I tried the main clip-vit-base-patch32_ggml-model-f32.gguf model and the q8_0 and q4_0 variants.
These are the encode times I get for a batch of 4 images:
clip-vit-base-patch32_ggml-model-f32.gguf
Avg Batch Img Encode Time: 272.21ms
clip-vit-base-patch32_ggml-model-f16.gguf
Avg Batch Img Encode Time: 665.07ms
clip-vit-base-patch32_ggml-model-q8_0.gguf
Avg Batch Img Encode Time: 333.96ms
clip-vit-base-patch32_ggml-model-q5_1.gguf
Avg Batch Img Encode Time: 322.71ms
clip-vit-base-patch32_ggml-model-q5_0.gguf
Avg Batch Img Encode Time: 354.86ms
clip-vit-base-patch32_ggml-model-q4_1.gguf
Avg Batch Img Encode Time: 330.20ms
clip-vit-base-patch32_ggml-model-q4_0.gguf
Avg Batch Img Encode Time: 539.32ms
f16 looks like an outlier, taking the most time.
But looking at f32(272.21ms) -> q8_0(333.96ms) -> q5_0(354.86ms) -> q4_0(539.32ms), time is getting worse. Its better with the _1 variants though.
Anyone know if this expected or is there something wrong?
Im having the same issue it takes a very long time to encode images. Im getting an average of 830 ms for my q5_0 model on a rather old(2019, i7) mac.
Any information regarding this would be much appreciated :)