compiler_rt: update memcpy to compare usizes at a time
cc @mikdusan noticed this improving stage2 compilation performance of stage3 by slightly more than 2x on macOS when using our own implementation
libSystem.memcpy: 299.05 seconds memcpy.zig: 996.01 seconds memcpy_usize.zig: 443.22 seconds
further notes:
-
memmovecan be modified to call this function -
memcpycan be modified to use this same optimization
Couldn't this use SIMD to go in chunks of 16/32 and copy that way? Should be straightforward with @Vector
Uhm, both the title and the code comment mention "comparing" usizes - that should read "copying", right?
Did the first comment also mean to mention memcmp instead of memcpy?
Although 4 people have already looked at this, so maybe I'm bugging.
no @rohlem you're right