Polyester.jl
Polyester.jl copied to clipboard
Poor performance using one(T) inside Polyester.@batch loop
I am running Julia 1.9 with two threads. If I include one(T) inside a Polyester.@batch loop, where T is a type known to the function, I obtain slow performance with a lot of allocations. Consider the following example
using Polyester
using BenchmarkTools
function test_no_poly!(y::Vector{T}, x::Vector{T}) where{T}
for i in 1:length(x)
y[i] = exp(x[i] + one(T))
end
end
function test_poly_one!(y::Vector{T}, x::Vector{T}) where{T}
Polyester.@batch for i in 1:length(x)
y[i] = exp(x[i] + one(T))
end
end
function test_poly_1!(y::Vector{T}, x::Vector{T}) where{T}
Polyester.@batch for i in 1:length(x)
y[i] = exp(x[i] + 1.0)
end
end
x = rand(1_000_000)
y = similar(x)
@btime test_no_poly!($y, $x)
@btime test_poly_one!($y, $x)
@btime test_poly_1!($y, $x)
This gives me
4.411 ms (0 allocations: 0 bytes)
84.761 ms (2000001 allocations: 30.52 MiB)
2.413 ms (0 allocations: 0 bytes)
It's because of Julia's not-specializing-on-type heuristics.
You could make a PR that has the macro call wrap all arguments to batch, and then unwrap them at the start of the loopbody
struct WrapType{T} end
wrap_type(@nospecialize(x)) = x
wrap_type(::Type{T}) where {T} = WrapType{T}()
unwrap_type(@nospecialize(x)) = x
unwrap_type(::WrapType{T}) where {T} = T
This lets you work around this particular one of Julia's performance-killing heuristics.