Cannot recognize index `i` in `@turbo` over `Vector{DateTime}`
Overview
I get this error:
ERROR: UndefVarError: i not defined
Stacktrace:
[1] scale_stamps_turbo(data::Vector{DateTime})
@ Main ./REPL[7]:5
[2] top-level scope
@ REPL[15]:1
When running:
using Dates
using LoopVectorization
function scale_stamps_turbo(data::Vector{Dates.DateTime})
out = similar(data, Float64)
ϕ = (data[lastindex(data)] - data[1]).value
@turbo for i ∈ eachindex(data)
out[i] = (data[i] - data[1]).value / ϕ
end
return out
end
When I do not use the @turbo macro, it works correctly:
using Dates
function scale_stamps(data::Vector{Dates.DateTime})
out = similar(data, Float64)
ϕ = (data[lastindex(data)] - data[1]).value
for i ∈ eachindex(data)
out[i] = (data[i] - data[1]).value / ϕ
end
return out
end
julia> hcat(mydata, scale_stamps(mydata))
5×2 Matrix{Any}:
1990-01-01T00:00:01 0.0
1990-01-01T00:00:03 0.142857
1990-01-01T00:00:06 0.357143
1990-01-01T00:00:10 0.642857
1990-01-01T00:00:15 1.0
Debugging 1: Correct behavior with Vector{Int64}
It looks like this is related to using Vector{Dates.DateTime}, the following two functions perform the same operation over vectors of Int, but work correctly:
using LoopVectorization
function scale(data::Vector{Int64})
out = similar(data, Float64)
ϕ = data[lastindex(data)] - data[1]
for i ∈ eachindex(data)
out[i] = (data[i] - data[1]) / ϕ
end
return out
end
function scale_turbo(data::Vector{Int64})
out = similar(data, Float64)
ϕ = data[lastindex(data)] - data[1]
@turbo for i ∈ eachindex(data)
out[i] = (data[i] - data[1]) / ϕ
end
return out
end
Sample Output
julia> hcat(somedata, scale(somedata), scale_turbo(somedata))
10×3 Matrix{Float64}:
1.0 0.0 0.0
3.0 0.08 0.08
6.0 0.2 0.2
8.0 0.28 0.28
12.0 0.44 0.44
14.0 0.52 0.52
17.0 0.64 0.64
21.0 0.8 0.8
22.0 0.84 0.84
26.0 1.0 1.0
Benchmark
julia> @benchmark scale(benchmark_data)
BenchmarkTools.Trial: 10000 samples with 1 evaluation.
Range (min … max): 254.978 μs … 1.789 ms ┊ GC (min … max): 0.00% … 81.20%
Time (median): 255.789 μs ┊ GC (median): 0.00%
Time (mean ± σ): 308.832 μs ± 141.360 μs ┊ GC (mean ± σ): 4.30% ± 8.33%
█▁▂▁ ▆▄ ▁ ▁
█████▆▇▅▅▅▄█▇█████▅▅▄▄▅▄▄▃▃▁▁▁▃▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁█ █
255 μs Histogram: log(frequency) by time 941 μs <
Memory estimate: 625.08 KiB, allocs estimate: 2.
julia> @benchmark scale_turbo(benchmark_data)
BenchmarkTools.Trial: 10000 samples with 1 evaluation.
Range (min … max): 59.220 μs … 888.608 μs ┊ GC (min … max): 0.00% … 73.05%
Time (median): 68.954 μs ┊ GC (median): 0.00%
Time (mean ± σ): 79.751 μs ± 76.075 μs ┊ GC (mean ± σ): 11.16% ± 10.52%
█▆▃▁ ▁
████▆▅▄▁▁▁▁▁▄█▅▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁█ █
59.2 μs Histogram: log(frequency) by time 705 μs <
Memory estimate: 625.08 KiB, allocs estimate: 2.
Helper Methods for Debugging
My inputs are vectors of strictly increasing values, so here are two functions for generating data:
generate Int64 Vectors / DateTime Vectors
using Random
function generate_data(N::Int64)
data = Vector{Int64}(undef,N)
v = 0
for i in 1:N
v += rand(1:5, 1)[1]
data[i] = v
end
return data
end
function generate_timestamps(N::Int64)
data = Vector{Dates.DateTime}(undef,N)
v = DateTime(1990, 1, 1, 0, 0, 0)
for i in 1:N
v += Second(i)
data[i] =v
end
return data
end
Ah, the problem is with my lazy method for supporting getproperty.
None of
julia> Union{Bool,Base.HWReal}
Union{Bool, Float32, Float64, Int16, Int32, Int64, Int8, UInt16, UInt32, UInt64, UInt8}
support getproperty, so it assumes that getproperty is being done on an object that can be hoisted out of the loop.
If the object cannot be hoisted out of the loop, then it must depend on the loop somehow, i.e. must be loaded from an index.
As LV only supports loading/operating on Union{Bool,Base.HWReal}, which don't have getproperty, then moving the expression out of the loop must generally be fine.
So that's what is happening here: (data[i] - data[1]).value gets moved out of the loop.
Once this is removed, the loop is now
getprop = (data[i] - data[1]).value
for i ∈ eachindex(data)
out[i] = getprop / ϕ
end
LoopVectorization also checks to make sure all arrays are of a valid element type; e.g. DateTime is not:
julia> typeof(ts)
Vector{DateTime} (alias for Array{DateTime, 1})
julia> LoopVectorization.check_args(ts)
false
However, the above loop only has out in it, and out isa Vector{Float64}, so it passes the check.
It doesn't notice that data was there.
Hence it does end up running this code instead of a fallback @inbounds @fastmath loop, and you get the error once it evaluates
getprop = (data[i] - data[1]).value
The simplest solution is probably to use reinterpret to cast your array:
julia> tsi = reinterpret(Int, ts);
julia> LoopVectorization.check_args(tsi)
true
julia> typeof(tsi)
Base.ReinterpretArray{Int64, 1, DateTime, Vector{DateTime}, false}
That way you can use your data::Vector{DateTime} with scale_turbo, after you loosen the signature to scale_turbo(data::AbstractVector{Int64}).
That worked, really helpful explanation as well. Thank you!
Would you like for something like this to be contributed to the Examples documentation?
That worked, really helpful explanation as well. Thank you!
Would you like for something like this to be contributed to the Examples documentation?
Sure, that'd be appreciated!