copy and copyto!
Hi,
I would like to have a copy and a copyto!. I guess I can make them with scale, scale! but is it efficient?
Is it what you have in mind?
Best regards
scale(!) with a 1 scalar, or with One() from VectorInterface, should be essentially as efficient, though that of course depends on the specific vector type implementation.
For copy versus scale, this is hard to benchmark because allocating objects leads to wildly fluctuating timings. For scale! versus copy!, I find this for two different cases with standard vectors:
# one big vector
julia> a = randn(100_000_000);
julia> b = randn(100_000_000);
julia> @benchmark copy!($b, $a)
BenchmarkTools.Trial: 454 samples with 1 evaluation per sample.
Range (min … max): 10.680 ms … 12.345 ms ┊ GC (min … max): 0.00% … 0.00%
Time (median): 10.892 ms ┊ GC (median): 0.00%
Time (mean ± σ): 11.016 ms ± 255.453 μs ┊ GC (mean ± σ): 0.00% ± 0.00%
▆▂▆▇█▄
▄▂▂▄▃▅▄████████▇▄▅▅▃▂▃▃▄▃▄▂▂▃▄▃▄▅▄▃▃▃▃▂▃▄▄▃▄▃▃▃▃▃▅▅▃▃▁▃▄▃▃▁▂ ▃
10.7 ms Histogram: frequency by time 11.6 ms <
Memory estimate: 0 bytes, allocs estimate: 0.
julia> @benchmark copyto!($b, $a)
BenchmarkTools.Trial: 459 samples with 1 evaluation per sample.
Range (min … max): 10.761 ms … 11.168 ms ┊ GC (min … max): 0.00% … 0.00%
Time (median): 10.905 ms ┊ GC (median): 0.00%
Time (mean ± σ): 10.908 ms ± 45.304 μs ┊ GC (mean ± σ): 0.00% ± 0.00%
▁▆█▅▅▃▄
▂▁▂▃▂▃▂▃▂▂▃▃▃▃▃▃▃▃▃▄▄▇███████▇▆▅▄▅▅▃▃▃▃▃▃▃▂▁▁▁▃▂▁▂▁▁▁▁▁▁▁▁▂ ▃
10.8 ms Histogram: frequency by time 11.1 ms <
Memory estimate: 0 bytes, allocs estimate: 0.
julia> @benchmark scale!($b, $a, 1)
BenchmarkTools.Trial: 359 samples with 1 evaluation per sample.
Range (min … max): 12.935 ms … 15.907 ms ┊ GC (min … max): 0.00% … 0.00%
Time (median): 13.549 ms ┊ GC (median): 0.00%
Time (mean ± σ): 13.955 ms ± 807.486 μs ┊ GC (mean ± σ): 0.00% ± 0.00%
▁▂▄▇█▇▁ ▂▂▂
▄▆▄▆▄▆█▇███████▆█▇▆▄▄▆▆▁▆▄▆▆▁▄▁▁▁▁▄▆▁▄▁▁▆▁▄▄▆▁▇▆█████▁▄▄▁▇▇█ ▆
12.9 ms Histogram: log(frequency) by time 15.8 ms <
Memory estimate: 0 bytes, allocs estimate: 0.
julia> @benchmark scale!($b, $a, One())
BenchmarkTools.Trial: 356 samples with 1 evaluation per sample.
Range (min … max): 12.829 ms … 15.854 ms ┊ GC (min … max): 0.00% … 0.00%
Time (median): 13.617 ms ┊ GC (median): 0.00%
Time (mean ± σ): 14.077 ms ± 830.366 μs ┊ GC (mean ± σ): 0.00% ± 0.00%
▁▂█ ▂
▃▂▃▃▂▂▃▄▄█▇███▆█▅▃▅▃▃▂▄▃▄▃▃▃▃▃▃▃▃▂▃▃▄▁▃▃▃▃▆▃▃▃▄▂▄▄▃▄▄▄▅▃▄▃▃▃ ▃
12.8 ms Histogram: frequency by time 15.8 ms <
and
# many small vectors
julia> as = [randn(1000) for _ in 1:1000];
julia> bs = [randn(1000) for _ in 1:1000];
julia> @benchmark @inbounds for i = 1:1000
copy!(($bs)[i],($as)[i])
end
BenchmarkTools.Trial: 10000 samples with 1 evaluation per sample.
Range (min … max): 163.209 μs … 374.791 μs ┊ GC (min … max): 0.00% … 0.00%
Time (median): 183.667 μs ┊ GC (median): 0.00%
Time (mean ± σ): 183.563 μs ± 8.196 μs ┊ GC (mean ± σ): 0.00% ± 0.00%
▁▁▁▁▂▄▂▅▅▇▇▅█▆▇███▆▇▄▄▃▃▂▁
▁▁▁▁▂▃▃▄▅▅▆▆▇▇▇████████████████████████████▇▅▅▄▄▄▄▃▃▂▂▂▂▂▂▂▁▂ ▅
163 μs Histogram: frequency by time 205 μs <
Memory estimate: 0 bytes, allocs estimate: 0.
julia> @benchmark @inbounds for i = 1:1000
copyto!(($bs)[i],($as)[i])
end
BenchmarkTools.Trial: 10000 samples with 1 evaluation per sample.
Range (min … max): 166.583 μs … 390.708 μs ┊ GC (min … max): 0.00% … 0.00%
Time (median): 182.458 μs ┊ GC (median): 0.00%
Time (mean ± σ): 183.484 μs ± 8.007 μs ┊ GC (mean ± σ): 0.00% ± 0.00%
▁▃▄▄▅▇▆▇▇▇██▆█▆▅▅▄▂▂▂▁▁
▁▁▁▁▂▃▃▅▅▆▆████████████████████████▇▆▅▆▄▄▄▃▃▃▃▃▂▂▃▂▂▂▂▂▂▂▂▂▂▂ ▅
167 μs Histogram: frequency by time 207 μs <
Memory estimate: 0 bytes, allocs estimate: 0.
julia> @benchmark @inbounds for i = 1:1000
scale!(($bs)[i],($as)[i], 1)
end
BenchmarkTools.Trial: 10000 samples with 1 evaluation per sample.
Range (min … max): 160.209 μs … 294.959 μs ┊ GC (min … max): 0.00% … 0.00%
Time (median): 173.458 μs ┊ GC (median): 0.00%
Time (mean ± σ): 175.354 μs ± 8.120 μs ┊ GC (mean ± σ): 0.00% ± 0.00%
▂▄▅█▆▅▇▅▄▄▁▂▁ ▁
▁▁▁▁▂▂▃▅▆██████████████▇██▆▇▇▅▇▇▆▆▆▇▆▅▇▆▆▇▇▆▇████▇█▇▅▅▄▃▃▃▂▂▂ ▅
160 μs Histogram: frequency by time 192 μs <
Memory estimate: 0 bytes, allocs estimate: 0.
julia> @benchmark @inbounds for i = 1:1000
scale!(($bs)[i],($as)[i], One())
end
BenchmarkTools.Trial: 10000 samples with 1 evaluation per sample.
Range (min … max): 141.125 μs … 280.333 μs ┊ GC (min … max): 0.00% … 0.00%
Time (median): 156.625 μs ┊ GC (median): 0.00%
Time (mean ± σ): 157.713 μs ± 8.658 μs ┊ GC (mean ± σ): 0.00% ± 0.00%
▁▃▅▃▂▂▁ ▂▃▅▆▇▇█▇▅▄▃▃▂▂▂▂▂▄▅▃▃▂▁
▁▁▂▃▅████████▇▇▇█████████████████████████▆▆▅▅▅▇▆▇█▇█▇▇▆▆▅▄▃▂▂ ▆
141 μs Histogram: frequency by time 176 μs <
Memory estimate: 0 bytes, allocs estimate: 0.
So in the latter case, scale! is actually slightly faster, whereas in the former case of one large vector, it seems slightly slower. I assume copy! can call out to an OS call like memcopy that gains a bit of extra performance for very large memory copies.
But whether any of this actually makes any difference in a real-world application, I would doubt it, as typically there will be several much more costly operations than those.
I tried to stay away from copy because that is already provided by base, but with copy, copy!, copyto! and deepcopy!, there is quite a number of things to choose from that might or might not work, or might need to be implemented by new types.
I would say that it is probably safe to assume that most custom types, especially those that behave as vectors in the VectorInterface sense, also support Base.copy, but I did not see it as necessary to be included in the minimal interface.
Maybe I should have called this package MinimalVectorInterface.jl.