upipe
upipe copied to clipboard
more optimization for ubuf_pic and upipe_set_color
This time I managed to get things right. Again "add a basic benchmark for ubuf_pic_clear" is not supposed to be committed.
It compiles to what I expect on gcc as far back as 4.8. That being a hot loop of 4 instructions: movdqu, add, cmp, conditional jump
The performance increase is much more modest than the original improvements. From ~6500 to ~7000 calls per second on an AMD Ryzen 7 3700X desktop and from ~2600 to ~2900 on an Intel Xeon E3-1245 v5 server and from ~2000 to ~2200 on an Intel Xeon CPU E3-1265L v3 server with just 1 memory channel populated.
@nto if your previous measurements were done on an x86 system would you like to look at this patch set?