Polyester.jl icon indicating copy to clipboard operation
Polyester.jl copied to clipboard

Poor performance using one(T) inside Polyester.@batch loop

Open ajshephard opened this issue 2 years ago • 1 comments

I am running Julia 1.9 with two threads. If I include one(T) inside a Polyester.@batch loop, where T is a type known to the function, I obtain slow performance with a lot of allocations. Consider the following example

using Polyester
using BenchmarkTools

function test_no_poly!(y::Vector{T}, x::Vector{T}) where{T}
    for i in 1:length(x)
        y[i] = exp(x[i] + one(T))
    end
end

function test_poly_one!(y::Vector{T}, x::Vector{T}) where{T}
    Polyester.@batch for i in 1:length(x)
        y[i] = exp(x[i] + one(T))
    end
end

function test_poly_1!(y::Vector{T}, x::Vector{T}) where{T}
    Polyester.@batch for i in 1:length(x)
        y[i] = exp(x[i] + 1.0)
    end
end

x = rand(1_000_000)
y = similar(x)

@btime test_no_poly!($y, $x)
@btime test_poly_one!($y, $x)
@btime test_poly_1!($y, $x)

This gives me

  4.411 ms (0 allocations: 0 bytes)
  84.761 ms (2000001 allocations: 30.52 MiB)
  2.413 ms (0 allocations: 0 bytes)

ajshephard avatar Aug 05 '23 09:08 ajshephard

It's because of Julia's not-specializing-on-type heuristics.

You could make a PR that has the macro call wrap all arguments to batch, and then unwrap them at the start of the loopbody

struct WrapType{T} end
wrap_type(@nospecialize(x)) = x
wrap_type(::Type{T}) where {T} = WrapType{T}()

unwrap_type(@nospecialize(x)) = x
unwrap_type(::WrapType{T}) where {T} = T

This lets you work around this particular one of Julia's performance-killing heuristics.

chriselrod avatar Aug 08 '23 03:08 chriselrod